MemKraft 🧠
Bitemporal memory × empirical tuning: the first self-improvement ledger for AI agents. Your agent's accountable past, in plain Markdown.
🏆 LongMemEval 98.0% — #1 on open-source agent long-term memory benchmarks (Surpasses MemPalace 96.6%, MEMENTO by Microsoft 90.8% · LLM-as-judge · oracle 50 · 3-run semantic majority)
v1.0.2 · Zero-dependency compound knowledge system for AI agents. Auto-extract, classify, search, tune, and time-travel — all in plain Markdown. Debugging is memory. Time travel is memory. Multi-agent handoffs are memory. Facts have bitemporal validity. Memories decay reversibly. Wiki links build graphs. Tuning iterations leave an audit trail.
Plain Markdown source-of-truth · zero deps · zero keys · zero LLM calls inside MemKraft. In 30 seconds:
pipx install memkraft && memkraft init && memkraft agents-hint claude-code
API overview (12 public methods)
| API | Since | Role |
|---|---|---|
track | 0.5 | Start tracking an entity |
update | 0.5 | Append information to an entity |
search | 0.5 | Hybrid search (exact + IDF + fuzzy) |
tier_set | 0.8 | Set tier: core / recall / archival |
fact_add | 0.8 | Record a bitemporal fact |
log_event | 0.8 | Log a timestamped event |
decision_record | 0.9 | Capture a decision with rationale |
evidence_first | 0.9 | Retrieve evidence before acting |
prompt_register | 1.0 | Register a prompt/skill as an entity |
prompt_eval | 1.0 | Record one tuning iteration |
prompt_evidence | 1.0 | Cite past tuning results |
convergence_check | 1.0 | Auto-judge convergence |
Self-improvement loop: register → tune → recall → decide, every step auditable and time-travelable. See MIGRATION.md for upgrading from 0.9.x (zero breaking changes).
30-Second Quickstart
pip install memkraft memkraft init # → creates ./memory/ with RESOLVER, TEMPLATES, entities/, ... memkraft agents-hint claude-code >> AGENTS.md # your agent is now memory-aware
Or scaffold a full project
memkraft init --template claude-code # CLAUDE.md + memory/ + examples memkraft init --template cursor # .cursorrules + memory/ memkraft init --template mcp # claude_desktop_config snippet + memory/ memkraft init --template rag # retrieval-focused structure memkraft init --template minimal # just memory/entities/ memkraft templates list # see all presets
Templates are idempotent — re-running init --template X never overwrites your edits.
Or in Python:
from memkraft import MemKraft mk = MemKraft("./memory"); mk.init() mk.track("Simon Kim", entity_type="person", source="news") mk.update("Simon Kim", info="Launched MemKraft 0.8.1", source="PyPI") mk.search("MemKraft")
That's it. Your agent now has persistent memory as plain markdown files.
No API keys. No database. No config. Just .md files you own.
The 1.0 Self-Improvement Loop
Register a prompt/skill, record iterations, cite past evidence, and let MemKraft auto-judge when to stop tuning — all in plain Markdown, no LLM calls inside MemKraft:
from memkraft import MemKraft mk = MemKraft("./memory") # 1. register a prompt/skill as a first-class entity mk.prompt_register( "my-skill", path="skills/my-skill/SKILL.md", owner="zeon", tags=["tuning"], ) # 2. record each empirical iteration (host agent dispatches the run # — MemKraft only persists the report) mk.prompt_eval( "my-skill", iteration=1, scenarios=[{ "name": "parallel-dispatch", "description": "3 subagents at once", "requirements": [{"item": "all return", "critical": True}], }], results=[{ "scenario": "parallel-dispatch", "success": True, "accuracy": 85, "tool_uses": 5, "duration_ms": 2000, "unclear_points": ["schema missing"], "discretion": [], }], ) # 3. cite past iterations before the next run mk.prompt_evidence("my-skill", "accuracy regression") # 4. stop when the last N iterations stabilise verdict = mk.convergence_check("my-skill", window=2) # -> {"converged": False, "reason": "insufficient-iters", # "iterations_checked": [1], # "suggested_next": "patch-and-iterate", ...}
Each call leaves an auditable trail on disk: a decision record per iteration, an incident when unclear points pile up, and wiki-links between iterations. Upgrade is zero-breaking from 0.9.x — see MIGRATION.md.
Optional extras
pip install 'memkraft[mcp]' # + MCP server (`python -m memkraft.mcp`) pip install 'memkraft[watch]' # + auto-reindex on save (`memkraft watch`) pip install 'memkraft[all]' # everything
Connect Any Agent in 30 Seconds
memkraft agents-hint <target> prints copy-paste-ready integration snippets:
memkraft agents-hint claude-code # → CLAUDE.md / AGENTS.md block memkraft agents-hint openclaw # → AGENTS.md block for ОpenClaw memkraft agents-hint cursor # → .cursorrules block memkraft agents-hint openai # → Custom GPT / function-calling schema memkraft agents-hint mcp # → claude_desktop_config.json snippet memkraft agents-hint langchain # → LangChain StructuredTool wrappers
Paste the output. Done. Or pipe it straight into your config:
memkraft agents-hint claude-code >> AGENTS.md
See examples/ for runnable variants.
What Makes MemKraft Different
| MemKraft | Mem0 | Letta | |
|---|---|---|---|
| Dependencies | 0 | many | many |
| API key required | No | Yes | Yes |
| Source of truth | Plain .md | Cloud/DB | DB |
| Local-first | ✅ | — | — |
| Git-friendly | ✅ | — | — |
More CLI & Python Usage
memkraft init memkraft extract "Simon Kim is the CEO of Hashed in Seoul." --source "news" memkraft brief "Simon Kim" memkraft doctor # 🟢/🟡/🔴 health check with fix hints memkraft doctor --fix --yes # auto-repair missing structure (create-only, never deletes) memkraft stats --export json # workspace stats for CI dashboards memkraft mcp doctor # validate MCP server readiness memkraft mcp test # remember→search→recall smoke test
MCP (Claude Desktop / Cursor) setup: see docs/mcp-setup.md.
Python Usage
from memkraft import MemKraft mk = MemKraft("/path/to/memory") mk.init() # returns {"created": [...], "exists": [...], "base_dir": "..."} # Extract entities & facts from text mk.extract_conversations("Simon Kim is the CEO of Hashed.", source="news") # Track an entity mk.track("Simon Kim", entity_type="person", source="news") mk.update("Simon Kim", info="Launched MemKraft", source="X/@simonkim_nft") # Search with fuzzy matching results = mk.search("venture capital", fuzzy=True) # Agentic multi-hop search with context-aware re-ranking results = mk.agentic_search( "who is the CEO of Hashed", context="crypto investment research", # Conway SMS: same query, different context → different ranking file_back=True, # feedback loop: results auto-filed back to entity timelines ) # Run health check (5 self-diagnostic assertions) report = mk.health_check() # → {"pass_rate": 80.0, "health_score": "A", ...} # Dream Cycle - nightly maintenance mk.dream(dry_run=True)
More CLI examples - 6 daily patterns that cover 90% of use
# 1. Extract & Track - auto-detect entities from any text memkraft extract "Simon Kim is the CEO of Hashed in Seoul." --source "news" memkraft extract "Revenue grew 85% YoY" --confidence verified --when "bull market" memkraft track "Simon Kim" --type person --source "X/@simonkim_nft" memkraft update "Simon Kim" --info "Launched MemKraft" --source "X/@simonkim_nft" # 2. Search & Recall - find anything in your memory memkraft search "venture capital" --fuzzy memkraft search "Seoul VC" --file-back # feedback loop: auto-file to timelines memkraft lookup "Simon" --brain-first memkraft agentic-search "who is the CEO of Hashed" --context "meeting prep" # 3. Meeting Prep - compile all context before a meeting memkraft brief "Simon Kim" memkraft brief "Simon Kim" --file-back # record brief generation in timeline memkraft links "Simon Kim" # 4. Ingest & Classify - inbox → structured pages (safe by default) memkraft cognify # recommend-only; add --apply to move files memkraft detect "Jack Ma and 马化腾 discussed AI" --dry-run # 5. Log & Reflect - structured audit trail memkraft log --event "Deployed v0.3" --tags deploy --importance high memkraft retro # daily Well / Bad / Next retrospective # 6. Maintain & Heal - Dream Cycle keeps memory healthy memkraft health-check # 5 assertions → pass rate + health score (A/B/C/D) memkraft dream --dry-run # nightly: sources, duplicates, bloated pages memkraft resolve-conflicts --strategy confidence # resolve contradictory facts memkraft diff # what changed since last maintenance? memkraft open-loops # find all unresolved items # 7. Debug Hypothesis Tracking - "Debugging is Memory" memkraft debug start "API returns 500 on POST /users" memkraft debug hypothesis "Database connection timeout" memkraft debug evidence "DB pool healthy" --result contradicts memkraft debug reject --reason "DB is fine" memkraft debug hypothesis "Request validation missing" memkraft debug evidence "Empty POST triggers 500" --result supports memkraft debug confirm memkraft debug end "Added request body validation" memkraft debug search-rejected "timeout" # avoid past mistakes
Features
Ingestion & Extraction
| Feature | Description |
|---|---|
| Auto-extract | Pipe any text in, get entities + facts out. Regex-based NER for EN, KR, CN, JP - no LLM calls. |
| CJK detection | 806 stopwords, 100 Chinese surnames, 85 Japanese surnames, Korean particle stripping. |
| Cognify pipeline | Routes inbox/ items to the right directory. Recommend-only by default - --apply to move. |
| Fact registry | Extracts currencies, percentages, dates, quantities into a cross-domain index. |
| Originals capture | Save raw text verbatim - no paraphrasing. |
| Confidence levels | Tag facts as verified / experimental / hypothesis. Dream Cycle warns untagged facts. |
| Applicability conditions | --when "condition" --when-not "condition" - facts get When: / When NOT: metadata. |
Search & Retrieval
| Feature | Description |
|---|---|
| Fuzzy search | difflib.SequenceMatcher-based. Works offline, zero setup. |
| Brain-first lookup | Searches entities → notes → decisions → meetings. Stops after sufficient high-relevance results. |
| Agentic search | Multi-hop: decompose query → search → traverse [[wiki-links]] → re-rank by tier/recency/confidence/applicability. |
| Goal-weighted re-ranking | Conway SMS: same query with different --context produces different rankings. |
| Feedback loop | --file-back: search results auto-filed back to entity timelines (compound interest for memory). |
| Progressive disclosure | 3-level query: L1 index (~50 tokens) → L2 section headers → L3 full file. |
| Backlinks | [[entity-name]] cross-references. See every page that references an entity. |
| Link suggestions | Auto-suggest missing [[wiki-links]] based on known entity names. |
Structure & Organization
| Feature | Description |
|---|---|
| Compiled Truth + Timeline | Dual-layer entity model: mutable current state + append-only audit trail with [Source:] tags. |
| Memory tiers | Core / Recall / Archival - explicit context window priority. promote to reclassify. |
| Memory type classification | 8 types: identity, belief, preference, relationship, skill, episodic, routine, transient. |
| Type-aware decay | Identity memories decay 10x slower than routine memories. Differential decay multipliers. |
| RESOLVER.md | MECE classification tree - every piece of knowledge has exactly one destination. |
| Source attribution | Every fact tagged with [Source: who, when, how]. Enforced by Dream Cycle. |
| Dialectic synthesis | Auto-detect contradictory facts during extract, tag [CONFLICT], generate CONFLICTS.md. |
| Conflict resolution | `resolve-conflicts --strategy newest |
| Live Notes | Persistent tracking for people and companies. Auto-incrementing updates + timeline. |
Maintenance & Audit
| Feature | Description |
|---|---|
| Dream Cycle | Nightly auto-maintenance: missing sources, thin pages, duplicates, inbox age, bloated pages, daily notes. |
| Debug Hypothesis Tracking | OBSERVE → HYPOTHESIZE → EXPERIMENT → CONCLUDE flow. Track hypotheses, evidence, rejections. Auto-switch warning after 2 failures. Search past sessions to avoid repeating failed approaches. |
| Health Check | 5 self-diagnostic assertions: source attribution, orphan facts, duplicates, inbox freshness, unresolved conflicts. Pass rate % + health score (A/B/C/D). |
| Memory decay | Older, unaccessed memories naturally decay - type-aware differential curves. |
| Fact dedup | Detects and merges duplicate facts across entities. |
| Auto-summarize | Condenses bloated pages while preserving key information. |
| Diff tracking | See exactly what changed since the last Dream Cycle. |
| Open loop tracking | Finds all pending / TODO / FIXME items across memory. |
Logging & Reflection
| Feature | Description |
|---|---|
| Session logging | JSONL event trail with tags, importance, entity, task, and decision fields. |
| Daily retrospective | Auto-generated Well / Bad / Next from session events + file changes. |
| Decision distillation | Scans events and notes for decision candidates. EN + KR keyword matching. |
| Meeting briefs | One command compiles entity info, timeline, open threads, and a pre-meeting checklist. |
Debugging
| Feature | Description |
|---|---|
| ✅ Debug Hypothesis Tracking | OBSERVE→HYPOTHESIZE→EXPERIMENT→CONCLUDE loop with persistent failure memory. |
📸 Memory Snapshots & Time Travel (v0.5.1)
| Feature | Description |
|---|---|
| Snapshot | Create a point-in-time manifest of all memory files (hash, size, summary, sections, fact count, link count). Optionally embed full content. |
| Snapshot List | List all saved snapshots, newest first, with labels and metadata. |
| Snapshot Diff | Compare two snapshots (or snapshot vs live state). Shows added, removed, modified, unchanged files with byte deltas. |
| Time Travel | Search memory as it was at a past snapshot. Answer "what did I know about X on March 1st?" |
| Entity Timeline | Track how a specific entity evolved across all snapshots — new, modified, unchanged, deleted states. |
🧠 Channel Context Memory + Task Continuity + Agent Working Memory (v0.5.4)
| Feature | Description |
|---|---|
| Channel Context Memory | Per-channel context persistence. Save/load/update context keyed by channel ID (e.g. telegram-46291309). Stored in .memkraft/channels/{channel_id}.json. |
| Task Continuity Register | Task lifecycle tracking with full history. task_start → task_update → task_complete + task_history + task_list. Each update stores timestamp + status + note. Stored in .memkraft/tasks/{task_id}.json. |
| Agent Working Memory | Per-agent persistent context. agent_save / agent_load any working memory dict. Stored in .memkraft/agents/{agent_id}.json. |
agent_inject() | The key feature. Merges agent working memory + channel context + task history into a single ready-to-inject prompt block. Use this to give sub-agents full situational awareness. |
from memkraft import MemKraft mk = MemKraft("/path/to/memory") # Save channel context mk.channel_save("telegram-46291309", { "summary": "DM with Simon", "recent_tasks": ["vibekai deploy", "memkraft v0.5.4"], "preferences": {"language": "ko"}, }) # Register a task mk.task_start("deploy-001", "Deploy vibekai to production", channel_id="telegram-46291309", agent="zeon") mk.task_update("deploy-001", "active", "vercel build passed") # Save agent working memory mk.agent_save("zeon", { "key_context": "Simon's AI assistant", "active_tasks": ["deploy-001"], "learned": ["always report completion", "no silence"], }) # Inject merged context block into a sub-agent instruction block = mk.agent_inject("zeon", channel_id="telegram-46291309", task_id="deploy-001") print(block) # ## Agent Working Memory # - **key_context:** Simon's AI assistant # - **active_tasks:** deploy-001 # ... # ## Channel Context # - **summary:** DM with Simon # ... # ## Task Context # - **Task:** Deploy vibekai to production # - **Status:** active # - **History:** # - [2026-04-15T...] active: vercel build passed
from memkraft import MemKraft mk = MemKraft("/path/to/memory") # Take a snapshot before a big operation snap = mk.snapshot(label="before-migration", include_content=True) # ... time passes, memory changes ... # What changed? diff = mk.snapshot_diff(snap["snapshot_id"]) # vs live state # → {added: [...], removed: [...], modified: [...], unchanged_count: 42} # Search memory as it was at that snapshot results = mk.time_travel("venture capital", snapshot_id=snap["snapshot_id"]) # How did an entity evolve over time? timeline = mk.snapshot_entity("Simon Kim") # → [{snapshot_id, timestamp, fact_count, size, change_type: "new"}, ...]
🐛 Debugging is Memory
Debugging insights are too valuable to lose in scrollback. MemKraft treats the entire debug process as first-class memory.
The debug-hypothesis loop - inspired by Shen Huang's scientific debugging method:
OBSERVE → HYPOTHESIZE → EXPERIMENT → CONCLUDE
↑ |
| rejected? |
+←── next hypothesis ←───+
|
all rejected? → back to OBSERVE
mk.start_debug("bug description")- begin a tracked sessionmk.log_hypothesis(bug_id, "theory", "evidence")- record each theorymk.log_evidence(bug_id, hyp_id, "test result", "supports|contradicts")- track proofmk.reject_hypothesis(bug_id, hyp_id, "reason")- mark failed approachesmk.confirm_hypothesis(bug_id, hyp_id)- lock in the root causemk.end_debug(bug_id, "resolution")- close session, feed back to memory
Why it matters: rejected hypotheses are permanent memory. Next time you hit a similar bug, MemKraft surfaces what you already tried - no more repeating the same failed approaches.
API Reference
MemKraft(base_dir=None)
Initialize the memory system. If base_dir is not provided, uses $MEMKRAFT_DIR or ./memory.
from memkraft import MemKraft mk = MemKraft("/path/to/memory")
Core Methods
| Method | Description |
|---|---|
init(path="") | Create memory directory structure with all subdirectories and templates. |
track(name, entity_type="person", source="") | Start tracking an entity. Creates a live-note in live-notes/. |
update(name, info, source="manual") | Append new information to a tracked entity's timeline. |
brief(name, save=False, file_back=False) | Generate a meeting brief for an entity. file_back=True records the brief generation in the entity timeline. |
promote(name, tier="core") | Change memory tier: core / recall / archival. |
list_entities() | List all tracked entities with their types. |
Extraction & Classification
| Method | Description |
|---|---|
extract_conversations(input_text, source="", dry_run=False, confidence="experimental", applicability="") | Extract entities and facts from text. confidence: verified / experimental / hypothesis. applicability: "When: X | When NOT: Y". |
detect(text, source="", dry_run=False) | Detect entities in text (EN/KR/CN/JP). |
cognify(dry_run=False, apply=False) | Route inbox items to structured directories. Recommend-only by default. |
extract_facts_registry(text="") | Extract numeric/date facts into cross-domain index. |
detect_conflicts(entity_name, new_fact, source="") | Check for contradictory facts and tag with [CONFLICT]. |
resolve_conflicts(strategy="newest", dry_run=False) | Resolve conflicts. Strategies: newest, confidence, keep-both, prompt. |
classify_memory_type(text) | Classify text into one of 8 memory types. |
Search
| Method | Description |
|---|---|
search(query, fuzzy=False) | Search memory files. Returns list of {file, score, context, line}. |
agentic_search(query, max_hops=2, json_output=False, context="", file_back=False) | Multi-hop search with query decomposition, link traversal, and goal-weighted re-ranking. context enables Conway SMS reconstructive ranking. file_back enables the feedback loop. |
lookup(query, json_output=False, brain_first=False, full=False) | Brain-first lookup: stop early on high-relevance hits unless full=True. |
query(query="", level=1, recent=0, tag="", date="") | Progressive disclosure: L1=index, L2=sections, L3=full. |
links(name) | Show backlinks to an entity ([[wiki-links]]). |
Maintenance
| Method | Description |
|---|---|
dream(date=None, dry_run=False, resolve_conflicts=False) | Run Dream Cycle. 6 health checks + optional conflict resolution. |
health_check() | Run 5 self-diagnostic assertions. Returns {pass_rate, health_score, assertions}. |
decay(days=90, dry_run=False) | Flag stale facts. Type-aware: identity decays 10x slower than routine. |
dedup(dry_run=False) | Find and merge duplicate facts. |
summarize(name=None, max_length=500) | Auto-summarize bloated entity pages. |
diff() | Show changes since last Dream Cycle. |
open_loops(dry_run=False) | Find unresolved items (TODO/FIXME/pending). |
build_index() | Build memory index at .memkraft/index.json. |
suggest_links() | Suggest missing [[wiki-links]]. |
Logging
| Method | Description |
|---|---|
log_event(event, tags="", importance="normal", entity="", task="", decision="") | Log a session event to JSONL. |
log_read(date=None) | Read session events for a date. |
retro(dry_run=False) | Generate daily retrospective (Well / Bad / Next). |
distill_decisions() | Scan for decision candidates in events and notes. |
Debug Hypothesis Tracking
| Method | Description |
|---|---|
start_debug(bug_description) | Start a new debug session. Returns {bug_id, file, status}. |
log_hypothesis(bug_id, hypothesis, evidence="", status="testing") | Log a hypothesis. Auto-increments ID (H1, H2, ...). |
get_hypotheses(bug_id) | Get all hypotheses for a debug session. |
reject_hypothesis(bug_id, hypothesis_id, reason="") | Reject a hypothesis. Preserved permanently for future reference. |
confirm_hypothesis(bug_id, hypothesis_id) | Confirm a hypothesis. Feeds back into memory. |
log_evidence(bug_id, hypothesis_id, evidence_text, result="neutral") | Log evidence. Result: supports / contradicts / neutral. |
get_evidence(bug_id, hypothesis_id="") | Get evidence entries, optionally filtered by hypothesis. |
end_debug(bug_id, resolution) | End session with resolution. Auto-feeds to memory. |
get_debug_status(bug_id) | Get current session status and hypothesis counts. |
debug_history(limit=10) | List past debug sessions. |
search_debug_sessions(query) | Search past sessions by description/hypothesis/resolution. |
search_rejected_hypotheses(query) | Search rejected hypotheses — anti-pattern detector. |
Memory Snapshots & Time Travel
| Method | Description |
|---|---|
snapshot(label="", include_content=False) | Create a point-in-time snapshot of all memory files. Returns {snapshot_id, timestamp, label, file_count, total_bytes, path}. |
snapshot_list() | List all saved snapshots, newest first. |
snapshot_diff(snapshot_a, snapshot_b="") | Compare two snapshots, or a snapshot vs live state. Returns {added, removed, modified, unchanged_count}. |
time_travel(query, snapshot_id="", date="") | Search memory as it was at a past snapshot. Supports search by snapshot ID or date. |
snapshot_entity(name) | Track how a specific entity evolved across all snapshots (new/modified/unchanged/deleted). |
CLI Reference
memkraft <command> [options]
Commands
| Command | Description |
|---|---|
init [--path DIR] | Initialize memory structure |
extract TEXT [--source S] [--dry-run] [--confidence C] [--when W] [--when-not W] | Auto-extract entities and facts |
detect TEXT [--source S] [--dry-run] | Detect entities in text (EN/KR/CN/JP) |
track NAME [--type T] [--source S] | Start tracking an entity |
update NAME --info INFO [--source S] | Update a tracked entity |
list | List all tracked entities |
brief NAME [--save] [--file-back] | Generate meeting brief |
promote NAME [--tier T] | Change memory tier (core/recall/archival) |
search QUERY [--fuzzy] [--file-back] | Search memory files |
agentic-search QUERY [--max-hops N] [--json] [--context C] [--file-back] | Multi-hop agentic search |
lookup QUERY [--json] [--brain-first] [--full] | Brain-first lookup |
query [QUERY] [--level 1|2|3] [--recent N] [--tag T] [--date D] | Progressive disclosure query |
links NAME | Show backlinks to an entity |
cognify [--dry-run] [--apply] | Process inbox into structured pages |
log --event E [--tags T] [--importance I] [--entity E] [--task T] [--decision D] | Log session event |
log --read [--date D] | Read session events |
retro [--dry-run] | Daily retrospective |
distill-decisions | Scan for decision candidates |
health-check | Run 5 self-diagnostic assertions → health score |
dream [--date D] [--dry-run] [--resolve-conflicts] | Run Dream Cycle (nightly maintenance) |
resolve-conflicts [--strategy S] [--dry-run] | Resolve fact conflicts |
decay [--days N] [--dry-run] | Flag stale facts |
dedup [--dry-run] | Find and merge duplicates |
summarize [NAME] [--max-length N] | Auto-summarize bloated pages |
diff | Show changes since last Dream Cycle |
open-loops [--dry-run] | Find unresolved items |
index | Build memory index |
suggest-links | Suggest missing wiki-links |
extract-facts [TEXT] | Extract numeric/date facts |
debug start DESC | Start a new debug session (OBSERVE) |
debug hypothesis TEXT [--bug-id ID] [--evidence E] | Log a hypothesis (HYPOTHESIZE) |
debug evidence TEXT [--bug-id ID] [--hypothesis-id H] [--result R] | Log evidence (supports/contradicts/neutral) |
debug reject [--bug-id ID] [--hypothesis-id H] [--reason R] | Reject current hypothesis |
debug confirm [--bug-id ID] [--hypothesis-id H] | Confirm current hypothesis |
debug status [--bug-id ID] | Show debug session status |
debug history [--limit N] | List past debug sessions |
debug end RESOLUTION [--bug-id ID] | End debug session (CONCLUDE) |
debug search QUERY | Search past debug sessions |
debug search-rejected QUERY | Search rejected hypotheses (anti-patterns) |
snapshot [--label L] [--include-content] | Create a point-in-time memory snapshot |
snapshot-list | List all saved snapshots (newest first) |
snapshot-diff SNAP_A [SNAP_B] | Compare two snapshots or snapshot vs live state |
time-travel QUERY [--snapshot ID] [--date YYYY-MM-DD] | Search memory as it was at a past snapshot |
snapshot-entity NAME | Show how an entity evolved across snapshots |
selfupdate [--dry-run] | Self-upgrade MemKraft via pip when newer version on PyPI |
doctor [--check-updates] | Health check; with --check-updates also reports PyPI version status |
Staying Up To Date
MemKraft ships an opt-in self-upgrade flow so agents (and humans) never silently drift behind PyPI:
memkraft doctor --check-updates # 🟢 up to date / 🟡 update available / 🔴 PyPI unreachable memkraft selfupdate # pip install -U memkraft when newer memkraft selfupdate --dry-run # check only
Classic still works:
pip install -U memkraft
For agents: add memkraft doctor --check-updates to your weekly skill or heartbeat — if it reports 🟡, ask the human before running memkraft selfupdate. Never auto-upgrade without explicit consent.
For maintainers: pushing a vX.Y.Z git tag triggers .github/workflows/release.yml, which builds, verifies (twine check), publishes to PyPI, and cuts a GitHub Release. Requires a PYPI_API_TOKEN repo secret — add it at Settings → Secrets and variables → Actions.
Architecture
Raw Input ──▶ Extract ──▶ Classify ──▶ Forge ──▶ Compound Knowledge
▲ │ │
│ Confidence │
│ Applicability │
│ │
└──── Feedback Loop ◄── Brain-first recall ◄───────┘
maintained by Dream Cycle + Health Check
How It Works
Zero dependencies. Built entirely from Python stdlib: re for NER, difflib for fuzzy search, json for structured data, pathlib for file ops. No vector DB, no LLM calls at runtime, no framework lock-in.
Compiled Truth + Timeline. Every entity has two layers: a mutable Compiled Truth (current state) and an append-only Timeline with [Source:] tags. You get both "what we know now" and "how we got here."
Auto-Extract pipeline. Multi-stage NER: English Title Case → Korean particle stripping → Chinese surname detection (100 surnames) → Japanese surname detection (85 surnames) → fact extraction (X is/was/leads Y) → stopword filtering (806 KR/CN/JP stopwords).
Goal-weighted re-ranking (Conway SMS). agentic_search("X", context="meeting prep") and agentic_search("X", context="investment analysis") return different rankings from the same data. Memory type, confidence, and applicability conditions all factor into scoring.
Feedback loop. --file-back files search results back into entity timelines. Each query makes future queries richer - compound interest for memory.
Health Check. 5 assertions: (1) source attribution, (2) no orphan facts, (3) no duplicates, (4) inbox freshness, (5) no unresolved conflicts. Returns a pass rate and letter grade (A/B/C/D).
Memory Directory Structure
memory/
├── .memkraft/ # Internal state (index.json, timestamps)
├── sessions/ # Structured event logs (YYYY-MM-DD.jsonl)
├── RESOLVER.md # Classification decision tree (MECE)
├── TEMPLATES.md # Page templates with tier labels
├── CONFLICTS.md # Auto-generated conflict report
├── open-loops.md # Unresolved items hub
├── fact-registry.md # Cross-domain numeric/date facts
├── YYYY-MM-DD.md # Daily notes
├── entities/ # People, companies, concepts (Tier: recall)
├── live-notes/ # Persistent tracking targets (Tier: core)
├── decisions/ # Decision records with rationale
├── originals/ # Captured verbatim - no paraphrasing
├── inbox/ # Quick capture before classification
├── tasks/ # Work-in-progress context
├── meetings/ # Briefs and notes
└── debug/ # Debug sessions (DEBUG-YYYYMMDD-HHMMSS.md)
Comparison
| MemKraft | Mem0 | Letta | |
|---|---|---|---|
| Storage | Plain Markdown | Vector + Graph DB | DB-backed |
| Dependencies | Zero | Vector DB + API | DB + runtime |
| Offline / git-friendly | ✅ | ❌ | ❌ |
| Auto-extract (EN/KR/CN/JP) | ✅ | ✅ (LLM) | - |
| Agentic search | ✅ | - | - |
| Goal-weighted re-ranking | ✅ | - | - |
| Feedback loop | ✅ | - | - |
| Confidence levels | ✅ | - | - |
| Health check | ✅ | - | - |
| Conflict detection & resolution | ✅ | - | - |
| Source attribution | Required | - | - |
| Dream Cycle | ✅ | - | - |
| Memory tiers | ✅ | - | ✅ |
| Type-aware decay | ✅ | - | - |
| Debug hypothesis tracking | ✅ | - | - |
| Memory snapshots & time travel | ✅ | ❌ | ❌ |
| Entity evolution timeline | ✅ | ❌ | ❌ |
| Snapshot diff | ✅ | ❌ | ❌ |
| Semantic search | ❌ | ✅ | - |
| Graph memory | ❌ | ✅ | - |
| Self-editing memory | ❌ | - | ✅ |
| Cost | Free | Free tier + paid | Free |
Choose MemKraft when: you want portable, git-friendly, zero-dependency memory that works with any agent framework, offline, forever.
Choose something else when: you need semantic/vector search, graph traversal, or a full agent runtime with virtual context management.
Reproducing LongMemEval Results
MemKraft v1.0.2 achieves 98.0% on LongMemEval (LLM-as-judge, oracle subset, 3-run semantic majority vote). Single-run performance: 96–98% (non-deterministic at inference level — sampling, not memory).
Comparison vs prior SOTA:
- MemKraft 1.0.2 — 98.0% (LLM-judge, oracle 50, 3-run majority)
- MemPalace — 96.6%
- MEMENTO/MS — 90.8%
Setup
git clone https://github.com/seojoonkim/memkraft cd memkraft pip install -e ".[bench]"
Run
cd benchmarks/longmemeval # Single run (96% typical) MODEL="claude-sonnet-4-6" \ ANTHROPIC_API_KEY="your-key" \ TAG="myrun" \ python3 run.py 50 oracle # LLM-as-judge scoring MODEL="claude-sonnet-4-6" \ ANTHROPIC_API_KEY="your-key" \ python3 llm_judge.py # 3-run majority vote (98% typical) MODEL="claude-sonnet-4-6" \ ANTHROPIC_API_KEY="your-key" \ python3 run_majority_vote.py
Notes
- Dataset: LongMemEval oracle subset (50 questions)
- Judge: LLM-as-judge (claude-sonnet-4-6) — semantic matching, not string match
- 98% = 3-run semantic majority vote result
- Single run: 96~100% depending on inference sampling
- Reproducibility note: Variance comes from LLM inference sampling, not from MemKraft itself. Memory storage and retrieval are deterministic.
Contributing
PRs welcome. See CONTRIBUTING.md.
License
MIT - use it however you want.
Changelog
v0.8.1 (2026-04-17)
The "connect-any-agent-in-30-seconds" release. Fully backward-compatible.
mk.init()now returns a dict ({"created": [...], "exists": [...], "base_dir": "..."}) so scripts and tests can branch on it without parsing stdout.memkraft agents-hint <target>CLI — paste-ready integration blocks forclaude-code,openclaw,openai,cursor,mcp,langchain. Supports--format jsonand--base-dir.examples/folder — drop-in AGENTS.md, OpenAI function-calling, 10-line RAG loop.python -m memkraft.mcp— MCP stdio server exposingremember / search / recall / link. Extras:pip install 'memkraft[mcp]'.memkraft watch— filesystem auto-reindex. Extras:pip install 'memkraft[watch]'.memkraft doctor— health check with 🟢/🟡/🔴 icons and fix hints.- 515 tests passing (was 492, +23 new).
v0.8.0 (2026-04-17)
Four new subsystems — all zero-dep, all backward-compatible.
1. Bitemporal Fact Layer — track facts with separate valid_time and record_time.
mk.fact_add("Simon", "role", "CEO of Hashed", valid_from="2020-03-01") mk.fact_add("Simon", "role", "CTO", valid_from="2018-01-01", valid_to="2020-02-29") mk.fact_at("Simon", "role", as_of="2019-06-01") # -> {"value": "CTO", ...} mk.fact_history("Simon") # full timeline, recorded-order mk.fact_invalidate("Simon", "role", invalid_at="2026-04-17")
Stored as inline Markdown markers in memory/facts/<slug>.md — human-readable, git-diffable.
2. Memory Tier Labels + Working Set — Letta-style core | recall | archival via a single YAML frontmatter line.
mk.tier_set(memory_id, tier="core") mk.tier_promote(memory_id) # archival -> recall -> core mk.tier_demote(memory_id) mk.tier_list(tier="core") mk.working_set(limit=10) # all core + recently-accessed recall
3. Reversible Decay + Tombstone — memories fade numerically instead of being deleted, and tombstoned files move to .memkraft/tombstones/ (still restorable).
mk.decay_apply(memory_id, decay_rate=0.5) # weight 1.0 -> 0.5 mk.decay_list(below_threshold=0.1) # show faded memories mk.decay_run(criteria={"weight_gt": 0.5}) # batch decay (cron) mk.decay_tombstone(memory_id) # move to tombstones, still on disk mk.decay_restore(memory_id) # full undo — weight back to 1.0
4. Cross-Entity Link Graph + Backlinks — [[Wiki Link]] patterns become a bidirectional graph; the file system is the DB.
mk.link_scan() # build/refresh index mk.link_backlinks("Simon") # files that mention [[Simon]] mk.link_forward("inbox/notes.md") # entities referenced from a file mk.link_graph("Simon", hops=2) # N-hop neighbourhood mk.link_orphans() # entities referenced but never defined
Index persisted at .memkraft/links/backlinks.json and .memkraft/links/forward.json.
Tests: 409 → 492 (83 new across test_v080_*).
v0.7.0 (2026-04-15)
channel_updatemodes:mode="append"(list append) andmode="merge"(dict shallow merge) added. Defaultmode="set"unchanged — fully backward compatible.- Task delegation tracking:
mk.task_delegate(task_id, from_agent, to_agent, context_note)— delegate a task between agents with delegation events in history.task_start()gains optionaldelegated_byparam. agent_injectfilters:max_history(default 5) limits task history entries.include_completed_tasks=Trueincludes completed channel tasks in the inject block.- Agent handoff:
mk.agent_handoff(from_agent, to_agent, task_id, context_note)— transfers working memory context, records handoff into_agentmemory, and delegates the task. Returns an inject-ready context block. - Channel task listing:
mk.channel_tasks(channel_id, status, limit)— filter tasks by channel and status (active/completed/all), sorted by creation time descending. - Task cleanup:
mk.task_cleanup(max_age_days, archive)— archive or delete completed tasks older than threshold. Archive goes to.memkraft/tasks/archive/. - New CLI commands:
channel-update --mode,task-delegate,channel-tasks,agent-handoff,task-cleanup - Tests: 357 → 409 (52 new in
test_v070_multiagent.py)
v0.5.4 (2026-04-15)
- Channel Context Memory:
mk.channel_save()/mk.channel_load()/mk.channel_update()— per-channel context persistence keyed by channel ID. Stored in.memkraft/channels/{channel_id}.json. Enables agents to recall channel-specific summaries, recent tasks, and preferences across sessions. - Task Continuity Register:
mk.task_start()/mk.task_update()/mk.task_complete()/mk.task_history()/mk.task_list()— full task lifecycle with timestamped history. Stored in.memkraft/tasks/{task_id}.json. - Agent Working Memory:
mk.agent_save()/mk.agent_load()/mk.agent_inject()— per-agent persistent working memory. Theagent_inject()method merges agent memory + channel context + task history into a single ready-to-inject prompt block for sub-agent delegation. - CLI commands:
channel-save/load,task-start/update/list,agent-save/load/inject - zero-dependency maintained (stdlib only: json, pathlib, datetime)
- Tests: 328 → 377 (49 new in
test_v054_context.py)
v0.5.1 (2026-04-14)
- Memory Snapshots & Time Travel:
mk.snapshot()/mk.snapshot_list()/mk.snapshot_diff()/mk.time_travel()/mk.snapshot_entity()— create point-in-time snapshots of all memory files (hash, size, summary, sections, fact count, link count), compare any two snapshots to see what changed, search memory as it was at a past date, and track how individual entities evolved over time - CLI snapshot commands:
memkraft snapshot/snapshot-list/snapshot-diff/time-travel/snapshot-entity - Snapshot manifests saved as JSON under
.memkraft/snapshots/— zero-dependency, git-friendly - Optional
--include-contentflag embeds full file text in snapshots for richer time-travel queries - Date-based time travel:
time-travel "query" --date 2026-03-01finds the closest snapshot on or before that date - Tests: 277 → 328 (51 new for Snapshots & Time Travel)
v0.4.1 (2026-04-13)
- README: comprehensive "Debugging is Memory" section with flow diagram, full API/CLI reference for debug methods
- README: Appendix — Inspirations & Credits (8 projects with links)
- Tests: 277 (79 new for Debug Hypothesis Tracking)
v0.4.0 (2026-04-13)
- Debug Hypothesis Tracking (Debugging is Memory):
mk.start_debug()/mk.log_hypothesis()/mk.log_evidence()/mk.reject_hypothesis()/mk.confirm_hypothesis()/mk.end_debug()- full OBSERVE→HYPOTHESIZE→EXPERIMENT→CONCLUDE loop with persistent failure memory, 2-fail auto-switch warning, anti-pattern detection viasearch_rejected_hypotheses(), and feedback into entity timelines - CLI debug commands:
memkraft debug start|hypothesis|evidence|reject|confirm|status|history|search-rejected - Tests: 198 → 277
v0.3.0 (2026-04-13)
- Query-to-Memory Feedback Loop:
agentic-search --file-back/search --file-back- search results auto-filed back to entity timelines (compound interest for memory) - Confidence Levels: All facts support
verified/experimental/hypothesistags;extract --confidence verified; Dream Cycle warns about untagged facts; agentic-search re-ranking weights by confidence; conflict resolution via--strategy confidence - Memory Health Assertions:
memkraft health-check- 5 self-diagnostic assertions (source attribution, orphan facts, duplicates, inbox freshness, unresolved conflicts) with pass rate % and health score (A/B/C/D); auto-runs in Dream Cycle - Applicability Conditions:
extract --when "condition" --when-not "condition"- facts getWhen:/When NOT:metadata; agentic-search boosts results matching current context's applicability conditions - Python re-export:
from memkraft import MemKraftnow works directly - Tests: 158 → 198
v0.2.0 (2026-04-12)
- Goal-Weighted Reconstructive Memory (Conway SMS):
agentic-search --context- same query with different context produces different result rankings; memory-type-aware re-ranking with differential decay curves - Dialectic Synthesis: Auto-detect contradictory facts during
extract, tag with[CONFLICT], generateCONFLICTS.mdreport, resolve viadream --resolve-conflictsorresolve-conflictscommand - Memory Type Classification: 8 memory types (identity, belief, preference, relationship, skill, episodic, routine, transient) with differential decay multipliers
- Type-Aware Decay: Identity memories decay 10x slower than routine memories
- Tests: 112 → 158
v0.1.0 (2026-04-12)
- Initial release: extract, detect, decay, dedup, summarize, agentic search
- Entity tracking (track, update, brief, promote)
- Dream Cycle (7 health checks), cognify, retro
- Hybrid search (exact + IDF-weighted + fuzzy), agentic multi-hop search
- Zero dependencies - stdlib only
MemKraft - Agents don't learn. They search. Until now.
🤖 Autonomous Memory Management (v1.1.0)
"Memory should manage itself."
Memory tends to grow without limit — agents add entries but rarely clean up. MemKraft 1.1.0 solves this with a self-managing lifecycle.
The Problem
- Add-only pattern: agents append to MEMORY.md every session, never prune
- Silent maintenance failures: nightly cleanup crons fail without notice
- No lifecycle: every memory entry treated equally, forever
The Solution: flush → compact → digest
from memkraft import MemKraft mk = MemKraft(base_dir="memory/") # 1. Import existing MEMORY.md → structured MemKraft data mk.flush("MEMORY.md") # 2. Auto-archive old/low-priority items result = mk.compact(max_chars=15000) # → {"moved": 47, "freed_chars": 89400, ...} # 3. Re-render MEMORY.md — always ≤ 15KB mk.digest("MEMORY.md") # → {"chars": 11700, "truncated": False} # 4. Check memory health health = mk.health() # → {"status": "healthy", "total_chars": 11700, "recommendations": [...]}
Real-world result
Our MEMORY.md grew to 153KB (1,862 lines) over weeks of agent sessions.
After flush → compact → digest: 11.7KB (170 lines). 92% reduction.
Nightly self-cleanup recipe
# Watch for real-time sync mk.watch("memory/", on_change="flush", interval=300) # Or set a nightly schedule (requires: pip install memkraft[schedule]) mk.schedule([ lambda: mk.compact(max_chars=15000), lambda: mk.digest("MEMORY.md"), ], cron_expr="0 23 * * *")
Appendix: Inspirations & Credits
MemKraft stands on the shoulders of giants. These projects and ideas shaped our approach:
| Project | Inspiration | Link |
|---|---|---|
| Karpathy auto-research | Evidence-based autonomous research methodology | Tweet |
| Shen Huang debug-hypothesis | Scientific debugging: hypothesis-driven, max 5-line experiments | GitHub · Tweet |
| Letta (MemGPT) | Tiered memory architecture (core / archival / recall) | GitHub |
| mem0 | Agent memory extraction and retrieval patterns | GitHub |
| Zep | Temporal memory decay and entity extraction | GitHub |
| MemoryWeaver | Dialectic synthesis and memory reconstruction | GitHub |
| Shubham Saboo's 6-agent system | OpenClaw-based multi-agent + SOUL.md / MEMORY.md pattern | Article |
| Karpathy llm-wiki | Wiki-style structured knowledge for LLMs | Tweet |
"If I have seen further, it is by standing on the shoulders of giants."
Thank you to all these creators for sharing their work openly. MemKraft exists because of you.