⌂ Home ☷ Board

Memory Systems for AI Agents — Landscape Review & Decision Framework

Prepared for: Luci (self-assessment) and Elmar (decision-maker) Date: 2026-04-11 Triggered by: GBrain release (Garry Tan, 2026-04-10) NotebookLM notebook: "Memory Systems for AI Agents — Landscape Review" (51 sources)


Executive Summary

The AI agent memory landscape has matured rapidly. Ten significant frameworks exist, three academic papers provide theoretical grounding, and expert opinion is genuinely divided on key architectural questions. After reviewing 50+ sources, the core finding is this: Luci's current markdown-based memory system is architecturally sound and aligned with the emerging expert consensus (Karpathy, Willison, Harrison Chase all converge on file-based, human-readable memory). The gaps are not in the storage model but in three specific capabilities: (1) entity extraction and compiled people-pages, (2) overnight consolidation/dream cycles, and (3) hybrid retrieval beyond grep.

Recommendation: Keep markdown as source of truth. Add a lightweight dream cycle for consolidation. Add compiled entity pages. Optionally add SQLite FTS5 for hybrid search. Do not adopt GBrain wholesale — steal the "compiled truth + timeline" page pattern and the dream cycle concept instead.


1. Current Landscape

The Ten Systems

System Architecture Storage Retrieval Open Source Production-Ready
GBrain (Garry Tan) Markdown + pgvector hybrid Git-backed .md files + Postgres Hybrid: vector + keyword + RRF Yes (MIT) Personal use
MemGPT / Letta OS-inspired tiered memory Message buffer + core + recall + archival Agent-managed paging Yes Yes (Letta platform)
Mem0 Universal memory layer Multi-scope (user/agent/session/org) Graph + vector + keyword Yes Yes (186M API calls/quarter)
LangMem SDK for agent learning LangGraph BaseStore Semantic + episodic + procedural Yes LangChain ecosystem
LlamaIndex Composable memory modules Pluggable backends Vector + summary + composable Yes RAG-heavy workflows
Obsidian + MCP Vault as memory backend .md files in folders BM25 or grep Community Personal use
Karpathy LLM Wiki Compiled markdown knowledge base raw/ + wiki/ + schema Optional BM25/hybrid Concept (gist) Concept
OpenAI Assistants Server-managed threads OpenAI servers Full-thread re-processing No Being deprecated (Aug 2026)
Anthropic Memory File-based, client-side /memories directory Glob/grep + compaction Partial (MCP) Yes (Claude Code)
Devin (Cognition) DeepWiki + proprietary Proprietary Code-specialized indexing No Enterprise ($500/mo)

Key Observations

Convergence on files: Three independent high-profile systems (Claude Code, GBrain, Manus) converged on markdown files as the memory substrate. Harrison Chase (LangChain) explicitly stated: "I very, very strongly believe that if you're building a long-horizon agent, you need to give it access to a file system." This is not a coincidence — files are inspectable, versioned, portable, and work with every tool.

MemGPT's conceptual influence: Even systems that don't use MemGPT's code adopt its mental model — tiered memory with working/short-term/long-term layers and agent-managed promotion between tiers. This is the dominant conceptual framework in the field.

Mem0 is the production benchmark: With 186M API calls/quarter and published benchmarks (26% accuracy improvement over OpenAI, 91% lower p95 latency, 90% token cost savings), Mem0 is the system to beat on quantitative metrics. Its graph memory variant adds entity relationship tracking.

GBrain's specific contribution: The "compiled truth + timeline" page model is genuinely novel. Each entity page has a rewritable top section (current understanding) and an append-only bottom section (evidence trail with dates). This separates the compiled state from the raw evidence — you can always trace why the system believes something.


2. Retrieval Architectures

The Four Approaches

Vector-only (pgvector, Pinecone, Weaviate, Chroma): - pgvector: 5-8ms HNSW at <5M vectors; pgvectorscale hit 471 QPS at 99% recall on 50M vectors - Best for semantic similarity ("planned parenthood" → "reproductive rights") - Worst for exact matches (entity names, error messages, project codes) - Cost: managed vector DBs run $50-200/GB/month vs $0.02/GB for local disk

Keyword-only (FTS5, Elasticsearch): - SQLite FTS5: sub-millisecond BM25 search, zero dependencies, single-file deployment - Best for exact intent — "what you search is what you get" - A contrarian benchmark showed FTS5 beating Pinecone on 4,300 agent memories: <1ms vs 50-200ms - Worst for semantic fuzzy matching

Hybrid (RRF + reranking): - BM25 + dense retrieval fused via Reciprocal Rank Fusion, then cross-encoder reranking - TREC iKAT 2025: nDCG@10 improved from 0.4218 to 0.4425 with RRF before reranking - The consensus "best pipeline": BM25 + vector → RRF → cross-encoder reranker → top-5 - GBrain uses this pattern natively with pgvector + Postgres FTS

Graph-based (Neo4j, GraphRAG, Graphiti): - Microsoft GraphRAG: LLM builds entity knowledge graph, pregenerates community summaries - Zep/Graphiti: 94.8% on DMR benchmark (vs MemGPT's 93.4%), bi-temporal model - Best for multi-hop reasoning across entities - Worst for setup cost and maintenance; Hamel Husain: "Graph databases are unnecessary — simpler solutions like CSV or Postgres typically suffice"

What Luci Uses Today

Luci currently uses grep + glob (keyword search over markdown files). This is equivalent to basic keyword retrieval with no semantic matching and no ranking. It works because: 1. The corpus is small (~200 memory files, ~50 vault files) 2. Entity names are explicit and greppable 3. MEMORY.md index provides a manual routing layer

The gap: as the corpus grows, grep won't surface semantically related memories that don't share exact keywords.

Recommendation for Luci

Add SQLite FTS5 as a second retrieval path. This is the highest-value, lowest-cost upgrade: - Zero new dependencies (SQLite is already used for mc.db, vault.db, email.db) - Sub-millisecond BM25 ranking - Can index all memory files, vault entries, email subjects, and WhatsApp messages - Trivial to implement: CREATE VIRTUAL TABLE memory_fts USING fts5(...) - Does NOT require a vector database, embedding model, or GPU

Skip vector search for now. At Luci's scale (<1,000 memory entries), FTS5 + the existing grep/glob is sufficient. Vector search adds value at 10K+ entries. Revisit if the corpus grows.

Skip graph memory. Entity relationships can be captured in compiled entity pages (markdown links) rather than a graph database. The maintenance overhead of Neo4j/Graphiti is not justified for a single-user agent.


3. Data Model Patterns

The Four Models

Markdown-as-source-of-truth (Luci's current model, Karpathy's LLM Wiki, Claude Code): - Files are the canonical store. The LLM reads, writes, and maintains them. - Git provides versioning and audit trail. - Human-readable and editable — Elmar can inspect and correct any memory. - Scales poorly beyond ~10K notes without indexing. Temporal queries ("what happened last week?") require additional metadata.

Append-only event logs (MemGPT/Letta): - Every interaction is an immutable event. The agent pages relevant events into working context. - Preserves full history — nothing is lost. - Requires active management (the agent must decide what to promote/archive). - Can grow unbounded without consolidation.

Compiled entity pages (GBrain, Karpathy Wiki): - Auto-generated pages for people, projects, and concepts that aggregate all mentions. - Top section: rewritable current understanding (compiled truth). - Bottom section: append-only evidence trail with dates and sources. - The knowledge compounds — each new source enriches existing pages.

Conversation threads (OpenAI Assistants, Memoria): - Raw conversation history as the memory substrate. - Simple but expensive (re-processes full thread per query). - OpenAI's approach scored worst on LOCOMO benchmark (52.9% accuracy).

What Luci Should Adopt

Keep markdown-as-source-of-truth. Add compiled entity pages as a second layer.

The compiled entity page pattern from GBrain/Karpathy is the single highest-value addition to Luci's memory. Here's why:

  1. Luci already has the raw data. vault.db has entities, edges, and file references. email.db has contacts. whatsapp-messages.db has conversation history. The data exists but isn't compiled.

  2. Entity pages solve the "who is X?" problem. When Elmar mentions "Stephan" or "Chazelle" or "NMG", Luci should be able to pull up a compiled page: who they are, recent interactions, key facts, contact details, linked projects.

  3. The compilation can run as a dream cycle. A nightly script scans vault.db, email.db, and whatsapp-messages.db, extracts entity mentions, and compiles/updates markdown pages in a ~/.claude/memory/entities/ directory.

  4. Markdown stays the source of truth. The compiled pages are derived artifacts — regenerable from the raw data. Git tracks changes. Elmar can inspect and correct.


4. Entity Extraction and Enrichment

Current State of the Art

Who Does It Well

What Luci Should Do

Use Claude itself for entity extraction during the dream cycle. The pipeline:

  1. Scan recent activity (email.db, whatsapp-messages.db, vault.db activity_log)
  2. Extract entities: people, organizations, projects, with context
  3. For each entity, check if a compiled page exists
  4. If yes: update the compiled truth section, append to evidence trail
  5. If no: create a new entity page from accumulated evidence
  6. Cross-link entity pages (person → project, person → organization)

This is Tier 2 work — clear requirements, known patterns. The dream cycle script could be a Python scheduler task that calls Claude via the API for extraction.


5. Dream Cycles / Overnight Consolidation

Who Does It

System What Happens When Measured Impact
Stanford Generative Agents Reflection: synthesize observations into abstract statements Inline (importance threshold) "Single biggest contributor to believable behavior"
MemGPT/Letta Sleep-time Compute Pre-process context, anticipate queries, pre-compute reasoning Between sessions (idle time) ~5x compute reduction, up to 13% accuracy improvement
GBrain Scan conversations, enrich entities, fix citations, consolidate Overnight cron Not formally measured
MemoryBank Ebbinghaus forgetting curves — decay unaccessed memories Continuous Personality adaptation over time
A-MEM Memory evolution — new memories trigger updates to old ones On new memory arrival SOTA on 6 models
Claude Code Auto-Dream Prune, merge, refresh memory files Idle time Not formally measured

The Sleep-time Compute Paper (Lin et al., 2025)

This is the strongest academic justification for dream cycles. Key findings: - ~5x reduction in test-time compute for equivalent accuracy - Up to 13% accuracy improvement on GSM-Symbolic, 18% on AIME - 2.5x cost reduction per query through amortization - Effectiveness correlates with query predictability — consolidation should focus on information the user is likely to ask about again

The insight: decompose the prompt into static context (pre-processable between sessions) and dynamic query (real-time). Sleep-time compute enriches the static context during idle periods.

What Luci Should Build

A lightweight dream cycle as a scheduler task (nightly or every 6 hours):

Phase 1 — Scan: Read recent activity_log entries, new emails, new WhatsApp messages, completed ticket work.

Phase 2 — Extract: Use Claude API to extract entities, facts, and relationships from new data.

Phase 3 — Compile: Update entity pages in ~/.claude/memory/entities/. Update/merge existing memory files. Flag contradictions for human review.

Phase 4 — Prune: Apply forgetting curves — memories not accessed in 30+ days get moved to an archive. Stale project memories (completed projects) get marked as historical.

Phase 5 — Index: Rebuild FTS5 index from all memory files. Update MEMORY.md index.

Phase 6 — Report: Log what changed to activity_log. Optionally send a Telegram summary: "Dream cycle complete: 3 entity pages updated, 2 memories pruned, 1 contradiction flagged."

This is the GBrain dream cycle pattern adapted for Luci's infrastructure — no Postgres, no pgvector, just markdown + SQLite + Claude API.


6. Academic Foundations

The Three Essential Papers

1. Generative Agents (Park et al., Stanford, 2023) - Introduced the memory stream + reflection + planning architecture - Reflection was the single biggest quality contributor - Retrieval scoring: recency × importance × relevance (all three needed) - Limitation: no forgetting, memory grows unbounded

2. Sleep-time Compute (Lin et al., Letta, 2025) - Academic foundation for dream cycles - 5x compute reduction, 13-18% accuracy improvement - Key insight: amortize reasoning across queries during idle time - Authors are MemGPT founders — direct lineage

3. Episodic Memory Position Paper (Pink et al., 2025) - Argues Tulving's episodic/semantic distinction is critical for agents - Episodic: raw events with timestamps ("Elmar said X on 2026-04-10") - Semantic: extracted facts ("Elmar prefers direct communication") - The consolidation pathway (episodic → semantic) is the dream cycle

Supporting Papers


7. Expert Commentary and Contested Questions

Where Experts Agree

  1. Memory is essential for agents — every named expert agrees on this
  2. Human-readable, inspectable memory is preferable — Karpathy, Willison, Chase, Anthropic all converge
  3. File systems are a legitimate memory substrate — Chase: "give it access to a file system"; three independent projects converged on markdown
  4. Consolidation/dream cycles add measurable value — Stanford reflections, Letta sleep-time compute, GBrain dream cycles

Where Experts Disagree

RAG vs Long Context:

Position Expert
RAG is dead — use compiled markdown wikis Karpathy
RAG is NOT dead — economics and context rot Hamel Husain
Context expands to fill limits — RAG stays relevant Chip Huyen
RAG is a good hack, fine-tuning may matter more long-term Jerry Liu
Long inputs > short prompts, but keep infra simple Simon Willison

Vector DBs — Necessary?

Position Expert
Single-vector embeddings lose critical info; use ColBERT Hamel Husain
No vector DB needed — grep, file trees work better for code MindStudio analysis
Faceted search with structured extraction > top-k vector Jason Liu
Hybrid is fine, optional not required Karpathy

Graph Memory — Worth the Overhead?

Position Expert
Graph databases unnecessary — CSV or Postgres suffice Hamel Husain
Significant operational overhead, not every query benefits Mem0 analysis
Graph adds value for multi-hop entity reasoning Neo4j/Graphiti advocates
Zep/Graphiti: 94.8% DMR, 18.5% accuracy improvement Rasmussen et al.

The Contrarian Case: "Just Use Markdown Files"

Three strong arguments: 1. Cost: $0.02/GB/month local disk vs $50-200/GB managed vector DBs 2. Inspectability: Human can read, edit, correct. No black-box embeddings. 3. Convergence: Claude Code, GBrain, Manus all independently chose markdown. This is a strong signal.

Counter-arguments: - Markdown doesn't scale past ~10K files without indexing - No semantic search (grep misses conceptual matches) - Temporal queries require additional metadata - Multi-hop reasoning across files requires iterative LLM calls

Production Failure Lessons


8. Tradeoffs Matrix for Luci

Criterion Keep Current Markdown Add Entity Pages + Dream Cycle Adopt GBrain Adopt Mem0 Build Custom (pgvector + graph)
Setup cost Zero (already working) Low (Python script + scheduler task) Medium (Postgres, PGLite, or Supabase) Medium (API integration or self-host) High (Postgres + pgvector + schema)
Retrieval latency <1ms (grep) <1ms (grep + FTS5) 5-50ms (hybrid pgvector + FTS) ~1s median (hosted), faster self-hosted 5-50ms
Recall quality Good for exact, poor for semantic Good for exact + BM25 ranked Very good (hybrid + RRF) Best measured (26% over OpenAI) Depends on implementation
Maintenance Manual MEMORY.md updates Dream cycle automates most Must maintain Postgres + PGLite API dependency or self-host complexity High ongoing effort
Integration with Claude Code Native (Read/Write/Grep tools) Native + scheduled task Requires MCP bridge or API wrapper Requires API integration Custom MCP server
Inspectability Excellent (plain markdown) Excellent (still markdown) Good (markdown + Postgres) Poor (opaque memory store) Depends on implementation
Entity tracking Manual memory files Auto-compiled entity pages Built-in (compiled truth + timeline) Built-in (graph memory) Custom entity extraction
Dream cycles None Nightly consolidation script Built-in (overnight crons) Not documented Custom implementation
Portability git clone git clone Postgres dependency API dependency Postgres dependency
Elmar can edit Yes (markdown) Yes (markdown) Partially (markdown, not Postgres) No Partially

9. Decision Framework

The Four Options

Option A: Keep current markdown approach (do nothing) - Pro: Working, simple, aligned with expert consensus - Con: No entity compilation, no consolidation, no ranked retrieval - Risk: Memory quality degrades as corpus grows - Cost: Zero

Option B: Add entity pages + dream cycle + FTS5 (recommended) - Pro: Highest value per effort, stays on markdown, adds the three missing capabilities - Con: Requires building the dream cycle script and entity extraction pipeline - Risk: Low — it's additive, doesn't change existing infra - Cost: ~1 week of dev time (Tier 2 ticket) - This is the "compiled people-pages from existing infra" option from the ticket

Option C: Adopt GBrain prototype - Pro: Battle-tested by Garry Tan, comprehensive feature set, MIT licensed - Con: Requires Postgres (PGLite or full), opinionated toward Garry's workflow, tightly coupled to OpenClaw ecosystem - Risk: Medium — introduces Postgres dependency, may not integrate cleanly with Claude Code's native file tools - Cost: ~2-3 weeks including migration and adaptation

Option D: Adopt a different pattern (Mem0, Letta, etc.) - Pro: Most mature production systems with benchmarks - Con: API dependency (Mem0 hosted) or significant self-hosting complexity (Letta), reduced inspectability - Risk: Medium-high — vendor lock-in or maintenance burden - Cost: ~2-4 weeks

Recommendation: Option B — Enhanced Markdown

The evidence strongly supports keeping markdown as the foundation and adding three specific capabilities:

  1. Compiled entity pages (steal from GBrain's "compiled truth + timeline" pattern)
  2. Auto-generated markdown pages for people, projects, and organizations
  3. Top section: current understanding (rewritable)
  4. Bottom section: evidence trail with dates (append-only)
  5. Stored in ~/.claude/memory/entities/

  6. Dream cycle (steal from Letta's sleep-time compute + GBrain's overnight crons)

  7. Nightly scheduler task
  8. Scans recent activity across all data sources
  9. Extracts entities, updates entity pages, prunes stale memories
  10. Logs changes and optionally sends Telegram summary

  11. SQLite FTS5 index (steal from Willison's SQLite-everything philosophy)

  12. Index all memory files, entity pages, and MEMORY.md
  13. BM25-ranked search as a retrieval path alongside grep
  14. Zero new dependencies

What NOT to build: - No vector database (not needed at Luci's scale) - No graph database (entity relationships captured in markdown links) - No external API dependencies (everything local) - No Postgres (SQLite is already the standard on Luci)

Patterns to Steal Without Adopting the Whole Stack

Pattern Source How to Apply
Compiled truth + timeline pages GBrain Entity page format with rewritable top + append-only bottom
Sleep-time compute / dream cycles Letta paper + GBrain crons Nightly scheduler task for consolidation
Ebbinghaus forgetting curves MemoryBank Decay weight on memories by last-access date
Memory evolution A-MEM New facts trigger updates to existing entity pages
Reflection Stanford Generative Agents Dream cycle synthesizes observations into higher-level insights
Zettelkasten linking A-MEM Entity pages cross-link to related entities
Context engineering Anthropic Keep memory lean — "smallest set of high-signal tokens"

10. Contested Questions — Where I (Luci) Stand

RAG vs long context? At Luci's scale (<1,000 memory entries), long context wins. The entire memory corpus fits in the 1M token window. But Chip Huyen's Context Expansion Law applies — as Luci grows, we'll need retrieval. FTS5 is the cheapest hedge.

Vector DBs necessary? No, not yet. pgvectorscale is impressive (471 QPS at 99% recall), but FTS5 at <1ms for keyword search covers Luci's needs. Revisit at 10K+ entries.

Graph memory? No. Entity relationships are better captured as markdown links between compiled entity pages. The overhead of Neo4j/Graphiti is not justified for a single-user agent.

GBrain? Respect the ideas, don't adopt the stack. GBrain's best concepts (compiled truth + timeline, dream cycles, hybrid retrieval) can be implemented on Luci's existing markdown + SQLite infrastructure without Postgres.

Is markdown enough? Yes, with enhancements. Karpathy, Willison, Chase, and Anthropic all validate this approach. The failures reported in production (Boschi 2026) are at scales (500+ files, multi-hop cross-file reasoning) that Luci hasn't hit yet. When we do, FTS5 + entity pages + dream cycles provide the safety net.


Sources

Academic Papers

  1. Park et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." https://arxiv.org/abs/2304.03442
  2. Packer et al. (2023). "MemGPT: Towards LLMs as Operating Systems." https://arxiv.org/abs/2310.08560
  3. Zhong et al. (2023). "MemoryBank: Enhancing Large Language Models with Long-Term Memory." https://arxiv.org/abs/2305.10250
  4. Wu et al. (2023). "Recursively Summarizing Enables Long-Term Dialogue Memory." https://arxiv.org/abs/2308.15022
  5. Xu et al. (2025). "A-MEM: Agentic Memory for LLM Agents." https://arxiv.org/abs/2502.12110
  6. Pink et al. (2025). "Episodic Memory is the Missing Piece for Long-Term LLM Agents." https://arxiv.org/abs/2502.06975
  7. Lin et al. (2025). "Sleep-time Compute: Beyond Inference Scaling at Test-time." https://arxiv.org/abs/2504.13171
  8. Chhikara et al. (2025). "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory." https://arxiv.org/abs/2504.19413
  9. Zhang et al. (2024). "A Survey on the Memory Mechanism of Large Language Model based Agents." https://arxiv.org/abs/2404.13501
  10. Hu et al. (2025). "Memory in the Age of AI Agents: A Survey." https://arxiv.org/abs/2512.13564
  11. Edge et al. (2024). "From Local to Global: A Graph RAG Approach." https://arxiv.org/abs/2404.16130
  12. Rasmussen et al. (2025). "Zep: A Temporal Knowledge Graph Architecture." https://arxiv.org/abs/2501.13956
  13. Li et al. (2025). "Long Context vs. RAG for LLMs." https://arxiv.org/abs/2501.01880

Frameworks & Tools

  1. GBrain. https://github.com/garrytan/gbrain
  2. Letta. https://github.com/letta-ai/letta
  3. Mem0. https://github.com/mem0ai/mem0
  4. LangMem. https://github.com/langchain-ai/langmem
  5. A-MEM. https://github.com/agiresearch/A-mem
  6. Graphiti (Zep). https://github.com/getzep/graphiti
  7. Microsoft GraphRAG. https://github.com/microsoft/graphrag
  8. MCP Knowledge Graph Memory Server. https://github.com/modelcontextprotocol/servers/tree/main/src/memory

Expert Commentary

  1. Karpathy. "LLM Wiki" gist. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
  2. Willison. "Things we learned about LLMs in 2024." https://simonw.substack.com/p/things-we-learned-about-llms-in-2024
  3. Husain. "Stop Saying RAG Is Dead." https://hamel.dev/notes/llm/rag/not_dead.html
  4. Jason Liu. "Beyond Chunks: Context Engineering." https://jxnl.co/writing/2025/08/27/facets-context-engineering/
  5. Harrison Chase. Sequoia podcast on context engineering. https://sequoiacap.com/podcast/context-engineering-our-way-to-long-horizon-agents-langchains-harrison-chase/
  6. Anthropic. "Effective context engineering for AI agents." https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
  7. Mem0. "State of AI Agent Memory 2026." https://mem0.ai/blog/state-of-ai-agent-memory-2026
  8. Letta. "Agent Memory" deep-dive. https://www.letta.com/blog/agent-memory
  9. Chip Huyen. "Agents." https://huyenchip.com/2025/01/07/agents.html
  10. Cognition. "Devin Annual Performance Review 2025." https://cognition.ai/blog/devin-annual-performance-review-2025

Technical References

  1. Alex Garcia. "Hybrid full-text search and vector search with SQLite." https://alexgarcia.xyz/blog/2024/sqlite-vec-hybrid-search/index.html
  2. RAGFlow. "2025 year-end review." https://ragflow.io/blog/rag-review-2025-from-rag-to-context
  3. Guillaume Laforge. "Understanding RRF in Hybrid Search." https://glaforge.dev/posts/2026/02/10/advanced-rag-understanding-reciprocal-rank-fusion-in-hybrid-search/
  4. "AI Agent Memory Management: When Markdown Files Are All You Need." https://dev.to/imaginex/ai-agent-memory-management-when-markdown-files-are-all-you-need-5ekk

Report prepared by Luci. 51 sources loaded in NotebookLM notebook "Memory Systems for AI Agents — Landscape Review." NotebookLM Deep Research report generated separately as a complementary artifact.