Prepared for: Luci (self-assessment) and Elmar (decision-maker) Date: 2026-04-11 Triggered by: GBrain release (Garry Tan, 2026-04-10) NotebookLM notebook: "Memory Systems for AI Agents — Landscape Review" (51 sources)
The AI agent memory landscape has matured rapidly. Ten significant frameworks exist, three academic papers provide theoretical grounding, and expert opinion is genuinely divided on key architectural questions. After reviewing 50+ sources, the core finding is this: Luci's current markdown-based memory system is architecturally sound and aligned with the emerging expert consensus (Karpathy, Willison, Harrison Chase all converge on file-based, human-readable memory). The gaps are not in the storage model but in three specific capabilities: (1) entity extraction and compiled people-pages, (2) overnight consolidation/dream cycles, and (3) hybrid retrieval beyond grep.
Recommendation: Keep markdown as source of truth. Add a lightweight dream cycle for consolidation. Add compiled entity pages. Optionally add SQLite FTS5 for hybrid search. Do not adopt GBrain wholesale — steal the "compiled truth + timeline" page pattern and the dream cycle concept instead.
| System | Architecture | Storage | Retrieval | Open Source | Production-Ready |
|---|---|---|---|---|---|
| GBrain (Garry Tan) | Markdown + pgvector hybrid | Git-backed .md files + Postgres | Hybrid: vector + keyword + RRF | Yes (MIT) | Personal use |
| MemGPT / Letta | OS-inspired tiered memory | Message buffer + core + recall + archival | Agent-managed paging | Yes | Yes (Letta platform) |
| Mem0 | Universal memory layer | Multi-scope (user/agent/session/org) | Graph + vector + keyword | Yes | Yes (186M API calls/quarter) |
| LangMem | SDK for agent learning | LangGraph BaseStore | Semantic + episodic + procedural | Yes | LangChain ecosystem |
| LlamaIndex | Composable memory modules | Pluggable backends | Vector + summary + composable | Yes | RAG-heavy workflows |
| Obsidian + MCP | Vault as memory backend | .md files in folders | BM25 or grep | Community | Personal use |
| Karpathy LLM Wiki | Compiled markdown knowledge base | raw/ + wiki/ + schema | Optional BM25/hybrid | Concept (gist) | Concept |
| OpenAI Assistants | Server-managed threads | OpenAI servers | Full-thread re-processing | No | Being deprecated (Aug 2026) |
| Anthropic Memory | File-based, client-side | /memories directory | Glob/grep + compaction | Partial (MCP) | Yes (Claude Code) |
| Devin (Cognition) | DeepWiki + proprietary | Proprietary | Code-specialized indexing | No | Enterprise ($500/mo) |
Convergence on files: Three independent high-profile systems (Claude Code, GBrain, Manus) converged on markdown files as the memory substrate. Harrison Chase (LangChain) explicitly stated: "I very, very strongly believe that if you're building a long-horizon agent, you need to give it access to a file system." This is not a coincidence — files are inspectable, versioned, portable, and work with every tool.
MemGPT's conceptual influence: Even systems that don't use MemGPT's code adopt its mental model — tiered memory with working/short-term/long-term layers and agent-managed promotion between tiers. This is the dominant conceptual framework in the field.
Mem0 is the production benchmark: With 186M API calls/quarter and published benchmarks (26% accuracy improvement over OpenAI, 91% lower p95 latency, 90% token cost savings), Mem0 is the system to beat on quantitative metrics. Its graph memory variant adds entity relationship tracking.
GBrain's specific contribution: The "compiled truth + timeline" page model is genuinely novel. Each entity page has a rewritable top section (current understanding) and an append-only bottom section (evidence trail with dates). This separates the compiled state from the raw evidence — you can always trace why the system believes something.
Vector-only (pgvector, Pinecone, Weaviate, Chroma): - pgvector: 5-8ms HNSW at <5M vectors; pgvectorscale hit 471 QPS at 99% recall on 50M vectors - Best for semantic similarity ("planned parenthood" → "reproductive rights") - Worst for exact matches (entity names, error messages, project codes) - Cost: managed vector DBs run $50-200/GB/month vs $0.02/GB for local disk
Keyword-only (FTS5, Elasticsearch): - SQLite FTS5: sub-millisecond BM25 search, zero dependencies, single-file deployment - Best for exact intent — "what you search is what you get" - A contrarian benchmark showed FTS5 beating Pinecone on 4,300 agent memories: <1ms vs 50-200ms - Worst for semantic fuzzy matching
Hybrid (RRF + reranking): - BM25 + dense retrieval fused via Reciprocal Rank Fusion, then cross-encoder reranking - TREC iKAT 2025: nDCG@10 improved from 0.4218 to 0.4425 with RRF before reranking - The consensus "best pipeline": BM25 + vector → RRF → cross-encoder reranker → top-5 - GBrain uses this pattern natively with pgvector + Postgres FTS
Graph-based (Neo4j, GraphRAG, Graphiti): - Microsoft GraphRAG: LLM builds entity knowledge graph, pregenerates community summaries - Zep/Graphiti: 94.8% on DMR benchmark (vs MemGPT's 93.4%), bi-temporal model - Best for multi-hop reasoning across entities - Worst for setup cost and maintenance; Hamel Husain: "Graph databases are unnecessary — simpler solutions like CSV or Postgres typically suffice"
Luci currently uses grep + glob (keyword search over markdown files). This is equivalent to basic keyword retrieval with no semantic matching and no ranking. It works because: 1. The corpus is small (~200 memory files, ~50 vault files) 2. Entity names are explicit and greppable 3. MEMORY.md index provides a manual routing layer
The gap: as the corpus grows, grep won't surface semantically related memories that don't share exact keywords.
Add SQLite FTS5 as a second retrieval path. This is the highest-value, lowest-cost upgrade: - Zero new dependencies (SQLite is already used for mc.db, vault.db, email.db) - Sub-millisecond BM25 ranking - Can index all memory files, vault entries, email subjects, and WhatsApp messages - Trivial to implement: CREATE VIRTUAL TABLE memory_fts USING fts5(...) - Does NOT require a vector database, embedding model, or GPU
Skip vector search for now. At Luci's scale (<1,000 memory entries), FTS5 + the existing grep/glob is sufficient. Vector search adds value at 10K+ entries. Revisit if the corpus grows.
Skip graph memory. Entity relationships can be captured in compiled entity pages (markdown links) rather than a graph database. The maintenance overhead of Neo4j/Graphiti is not justified for a single-user agent.
Markdown-as-source-of-truth (Luci's current model, Karpathy's LLM Wiki, Claude Code): - Files are the canonical store. The LLM reads, writes, and maintains them. - Git provides versioning and audit trail. - Human-readable and editable — Elmar can inspect and correct any memory. - Scales poorly beyond ~10K notes without indexing. Temporal queries ("what happened last week?") require additional metadata.
Append-only event logs (MemGPT/Letta): - Every interaction is an immutable event. The agent pages relevant events into working context. - Preserves full history — nothing is lost. - Requires active management (the agent must decide what to promote/archive). - Can grow unbounded without consolidation.
Compiled entity pages (GBrain, Karpathy Wiki): - Auto-generated pages for people, projects, and concepts that aggregate all mentions. - Top section: rewritable current understanding (compiled truth). - Bottom section: append-only evidence trail with dates and sources. - The knowledge compounds — each new source enriches existing pages.
Conversation threads (OpenAI Assistants, Memoria): - Raw conversation history as the memory substrate. - Simple but expensive (re-processes full thread per query). - OpenAI's approach scored worst on LOCOMO benchmark (52.9% accuracy).
Keep markdown-as-source-of-truth. Add compiled entity pages as a second layer.
The compiled entity page pattern from GBrain/Karpathy is the single highest-value addition to Luci's memory. Here's why:
Luci already has the raw data. vault.db has entities, edges, and file references. email.db has contacts. whatsapp-messages.db has conversation history. The data exists but isn't compiled.
Entity pages solve the "who is X?" problem. When Elmar mentions "Stephan" or "Chazelle" or "NMG", Luci should be able to pull up a compiled page: who they are, recent interactions, key facts, contact details, linked projects.
The compilation can run as a dream cycle. A nightly script scans vault.db, email.db, and whatsapp-messages.db, extracts entity mentions, and compiles/updates markdown pages in a ~/.claude/memory/entities/ directory.
Markdown stays the source of truth. The compiled pages are derived artifacts — regenerable from the raw data. Git tracks changes. Elmar can inspect and correct.
Use Claude itself for entity extraction during the dream cycle. The pipeline:
This is Tier 2 work — clear requirements, known patterns. The dream cycle script could be a Python scheduler task that calls Claude via the API for extraction.
| System | What Happens | When | Measured Impact |
|---|---|---|---|
| Stanford Generative Agents | Reflection: synthesize observations into abstract statements | Inline (importance threshold) | "Single biggest contributor to believable behavior" |
| MemGPT/Letta Sleep-time Compute | Pre-process context, anticipate queries, pre-compute reasoning | Between sessions (idle time) | ~5x compute reduction, up to 13% accuracy improvement |
| GBrain | Scan conversations, enrich entities, fix citations, consolidate | Overnight cron | Not formally measured |
| MemoryBank | Ebbinghaus forgetting curves — decay unaccessed memories | Continuous | Personality adaptation over time |
| A-MEM | Memory evolution — new memories trigger updates to old ones | On new memory arrival | SOTA on 6 models |
| Claude Code Auto-Dream | Prune, merge, refresh memory files | Idle time | Not formally measured |
This is the strongest academic justification for dream cycles. Key findings: - ~5x reduction in test-time compute for equivalent accuracy - Up to 13% accuracy improvement on GSM-Symbolic, 18% on AIME - 2.5x cost reduction per query through amortization - Effectiveness correlates with query predictability — consolidation should focus on information the user is likely to ask about again
The insight: decompose the prompt into static context (pre-processable between sessions) and dynamic query (real-time). Sleep-time compute enriches the static context during idle periods.
A lightweight dream cycle as a scheduler task (nightly or every 6 hours):
Phase 1 — Scan: Read recent activity_log entries, new emails, new WhatsApp messages, completed ticket work.
Phase 2 — Extract: Use Claude API to extract entities, facts, and relationships from new data.
Phase 3 — Compile: Update entity pages in ~/.claude/memory/entities/. Update/merge existing memory files. Flag contradictions for human review.
Phase 4 — Prune: Apply forgetting curves — memories not accessed in 30+ days get moved to an archive. Stale project memories (completed projects) get marked as historical.
Phase 5 — Index: Rebuild FTS5 index from all memory files. Update MEMORY.md index.
Phase 6 — Report: Log what changed to activity_log. Optionally send a Telegram summary: "Dream cycle complete: 3 entity pages updated, 2 memories pruned, 1 contradiction flagged."
This is the GBrain dream cycle pattern adapted for Luci's infrastructure — no Postgres, no pgvector, just markdown + SQLite + Claude API.
1. Generative Agents (Park et al., Stanford, 2023) - Introduced the memory stream + reflection + planning architecture - Reflection was the single biggest quality contributor - Retrieval scoring: recency × importance × relevance (all three needed) - Limitation: no forgetting, memory grows unbounded
2. Sleep-time Compute (Lin et al., Letta, 2025) - Academic foundation for dream cycles - 5x compute reduction, 13-18% accuracy improvement - Key insight: amortize reasoning across queries during idle time - Authors are MemGPT founders — direct lineage
3. Episodic Memory Position Paper (Pink et al., 2025) - Argues Tulving's episodic/semantic distinction is critical for agents - Episodic: raw events with timestamps ("Elmar said X on 2026-04-10") - Semantic: extracted facts ("Elmar prefers direct communication") - The consolidation pathway (episodic → semantic) is the dream cycle
RAG vs Long Context:
| Position | Expert |
|---|---|
| RAG is dead — use compiled markdown wikis | Karpathy |
| RAG is NOT dead — economics and context rot | Hamel Husain |
| Context expands to fill limits — RAG stays relevant | Chip Huyen |
| RAG is a good hack, fine-tuning may matter more long-term | Jerry Liu |
| Long inputs > short prompts, but keep infra simple | Simon Willison |
Vector DBs — Necessary?
| Position | Expert |
|---|---|
| Single-vector embeddings lose critical info; use ColBERT | Hamel Husain |
| No vector DB needed — grep, file trees work better for code | MindStudio analysis |
| Faceted search with structured extraction > top-k vector | Jason Liu |
| Hybrid is fine, optional not required | Karpathy |
Graph Memory — Worth the Overhead?
| Position | Expert |
|---|---|
| Graph databases unnecessary — CSV or Postgres suffice | Hamel Husain |
| Significant operational overhead, not every query benefits | Mem0 analysis |
| Graph adds value for multi-hop entity reasoning | Neo4j/Graphiti advocates |
| Zep/Graphiti: 94.8% DMR, 18.5% accuracy improvement | Rasmussen et al. |
Three strong arguments: 1. Cost: $0.02/GB/month local disk vs $50-200/GB managed vector DBs 2. Inspectability: Human can read, edit, correct. No black-box embeddings. 3. Convergence: Claude Code, GBrain, Manus all independently chose markdown. This is a strong signal.
Counter-arguments: - Markdown doesn't scale past ~10K files without indexing - No semantic search (grep misses conceptual matches) - Temporal queries require additional metadata - Multi-hop reasoning across files requires iterative LLM calls
| Criterion | Keep Current Markdown | Add Entity Pages + Dream Cycle | Adopt GBrain | Adopt Mem0 | Build Custom (pgvector + graph) |
|---|---|---|---|---|---|
| Setup cost | Zero (already working) | Low (Python script + scheduler task) | Medium (Postgres, PGLite, or Supabase) | Medium (API integration or self-host) | High (Postgres + pgvector + schema) |
| Retrieval latency | <1ms (grep) | <1ms (grep + FTS5) | 5-50ms (hybrid pgvector + FTS) | ~1s median (hosted), faster self-hosted | 5-50ms |
| Recall quality | Good for exact, poor for semantic | Good for exact + BM25 ranked | Very good (hybrid + RRF) | Best measured (26% over OpenAI) | Depends on implementation |
| Maintenance | Manual MEMORY.md updates | Dream cycle automates most | Must maintain Postgres + PGLite | API dependency or self-host complexity | High ongoing effort |
| Integration with Claude Code | Native (Read/Write/Grep tools) | Native + scheduled task | Requires MCP bridge or API wrapper | Requires API integration | Custom MCP server |
| Inspectability | Excellent (plain markdown) | Excellent (still markdown) | Good (markdown + Postgres) | Poor (opaque memory store) | Depends on implementation |
| Entity tracking | Manual memory files | Auto-compiled entity pages | Built-in (compiled truth + timeline) | Built-in (graph memory) | Custom entity extraction |
| Dream cycles | None | Nightly consolidation script | Built-in (overnight crons) | Not documented | Custom implementation |
| Portability | git clone | git clone | Postgres dependency | API dependency | Postgres dependency |
| Elmar can edit | Yes (markdown) | Yes (markdown) | Partially (markdown, not Postgres) | No | Partially |
Option A: Keep current markdown approach (do nothing) - Pro: Working, simple, aligned with expert consensus - Con: No entity compilation, no consolidation, no ranked retrieval - Risk: Memory quality degrades as corpus grows - Cost: Zero
Option B: Add entity pages + dream cycle + FTS5 (recommended) - Pro: Highest value per effort, stays on markdown, adds the three missing capabilities - Con: Requires building the dream cycle script and entity extraction pipeline - Risk: Low — it's additive, doesn't change existing infra - Cost: ~1 week of dev time (Tier 2 ticket) - This is the "compiled people-pages from existing infra" option from the ticket
Option C: Adopt GBrain prototype - Pro: Battle-tested by Garry Tan, comprehensive feature set, MIT licensed - Con: Requires Postgres (PGLite or full), opinionated toward Garry's workflow, tightly coupled to OpenClaw ecosystem - Risk: Medium — introduces Postgres dependency, may not integrate cleanly with Claude Code's native file tools - Cost: ~2-3 weeks including migration and adaptation
Option D: Adopt a different pattern (Mem0, Letta, etc.) - Pro: Most mature production systems with benchmarks - Con: API dependency (Mem0 hosted) or significant self-hosting complexity (Letta), reduced inspectability - Risk: Medium-high — vendor lock-in or maintenance burden - Cost: ~2-4 weeks
The evidence strongly supports keeping markdown as the foundation and adding three specific capabilities:
Stored in ~/.claude/memory/entities/
Dream cycle (steal from Letta's sleep-time compute + GBrain's overnight crons)
Logs changes and optionally sends Telegram summary
SQLite FTS5 index (steal from Willison's SQLite-everything philosophy)
What NOT to build: - No vector database (not needed at Luci's scale) - No graph database (entity relationships captured in markdown links) - No external API dependencies (everything local) - No Postgres (SQLite is already the standard on Luci)
| Pattern | Source | How to Apply |
|---|---|---|
| Compiled truth + timeline pages | GBrain | Entity page format with rewritable top + append-only bottom |
| Sleep-time compute / dream cycles | Letta paper + GBrain crons | Nightly scheduler task for consolidation |
| Ebbinghaus forgetting curves | MemoryBank | Decay weight on memories by last-access date |
| Memory evolution | A-MEM | New facts trigger updates to existing entity pages |
| Reflection | Stanford Generative Agents | Dream cycle synthesizes observations into higher-level insights |
| Zettelkasten linking | A-MEM | Entity pages cross-link to related entities |
| Context engineering | Anthropic | Keep memory lean — "smallest set of high-signal tokens" |
RAG vs long context? At Luci's scale (<1,000 memory entries), long context wins. The entire memory corpus fits in the 1M token window. But Chip Huyen's Context Expansion Law applies — as Luci grows, we'll need retrieval. FTS5 is the cheapest hedge.
Vector DBs necessary? No, not yet. pgvectorscale is impressive (471 QPS at 99% recall), but FTS5 at <1ms for keyword search covers Luci's needs. Revisit at 10K+ entries.
Graph memory? No. Entity relationships are better captured as markdown links between compiled entity pages. The overhead of Neo4j/Graphiti is not justified for a single-user agent.
GBrain? Respect the ideas, don't adopt the stack. GBrain's best concepts (compiled truth + timeline, dream cycles, hybrid retrieval) can be implemented on Luci's existing markdown + SQLite infrastructure without Postgres.
Is markdown enough? Yes, with enhancements. Karpathy, Willison, Chase, and Anthropic all validate this approach. The failures reported in production (Boschi 2026) are at scales (500+ files, multi-hop cross-file reasoning) that Luci hasn't hit yet. When we do, FTS5 + entity pages + dream cycles provide the safety net.
Report prepared by Luci. 51 sources loaded in NotebookLM notebook "Memory Systems for AI Agents — Landscape Review." NotebookLM Deep Research report generated separately as a complementary artifact.