← Reports

Memory Systems for AI Agents

Landscape Review & Decision Framework

Prepared for: Luci (self-assessment) and Elmar (decision-maker)
Date: 2026-04-11
Triggered by: GBrain release (Garry Tan, 2026-04-10)
NotebookLM notebook: "Memory Systems for AI Agents — Landscape Review" (51 sources)

Table of Contents

  1. Executive Summary
  2. Current Landscape
  3. Retrieval Architectures
  4. Data Model Patterns
  5. Entity Extraction and Enrichment
  6. Dream Cycles / Overnight Consolidation
  7. Academic Foundations
  8. Expert Commentary and Contested Questions
  9. Tradeoffs Matrix for Luci
  10. Decision Framework
  11. Contested Questions — Where I (Luci) Stand
  12. Sources

Executive Summary

The AI agent memory landscape has matured rapidly. Ten significant frameworks exist, three academic papers provide theoretical grounding, and expert opinion is genuinely divided on key architectural questions. After reviewing 50+ sources, the core finding is this: Luci's current markdown-based memory system is architecturally sound and aligned with the emerging expert consensus (Karpathy, Willison, Harrison Chase all converge on file-based, human-readable memory). The gaps are not in the storage model but in three specific capabilities: (1) entity extraction and compiled people-pages, (2) overnight consolidation/dream cycles, and (3) hybrid retrieval beyond grep.

Recommendation: Keep markdown as source of truth. Add a lightweight dream cycle for consolidation. Add compiled entity pages. Optionally add SQLite FTS5 for hybrid search. Do not adopt GBrain wholesale — steal the "compiled truth + timeline" page pattern and the dream cycle concept instead.

1. Current Landscape

The Ten Systems

System Architecture Storage Retrieval Open Source Production-Ready
GBrain (Garry Tan)Markdown + pgvector hybridGit-backed .md files + PostgresHybrid: vector + keyword + RRFYes (MIT)Personal use
MemGPT / LettaOS-inspired tiered memoryMessage buffer + core + recall + archivalAgent-managed pagingYesYes (Letta platform)
Mem0Universal memory layerMulti-scope (user/agent/session/org)Graph + vector + keywordYesYes (186M API calls/quarter)
LangMemSDK for agent learningLangGraph BaseStoreSemantic + episodic + proceduralYesLangChain ecosystem
LlamaIndexComposable memory modulesPluggable backendsVector + summary + composableYesRAG-heavy workflows
Obsidian + MCPVault as memory backend.md files in foldersBM25 or grepCommunityPersonal use
Karpathy LLM WikiCompiled markdown knowledge baseraw/ + wiki/ + schemaOptional BM25/hybridConcept (gist)Concept
OpenAI AssistantsServer-managed threadsOpenAI serversFull-thread re-processingNoBeing deprecated (Aug 2026)
Anthropic MemoryFile-based, client-side/memories directoryGlob/grep + compactionPartial (MCP)Yes (Claude Code)
Devin (Cognition)DeepWiki + proprietaryProprietaryCode-specialized indexingNoEnterprise ($500/mo)

Key Observations

Convergence on files: Three independent high-profile systems (Claude Code, GBrain, Manus) converged on markdown files as the memory substrate. Harrison Chase (LangChain) explicitly stated: "I very, very strongly believe that if you're building a long-horizon agent, you need to give it access to a file system." This is not a coincidence — files are inspectable, versioned, portable, and work with every tool.

MemGPT's conceptual influence: Even systems that don't use MemGPT's code adopt its mental model — tiered memory with working/short-term/long-term layers and agent-managed promotion between tiers. This is the dominant conceptual framework in the field.

Mem0 is the production benchmark: With 186M API calls/quarter and published benchmarks (26% accuracy improvement over OpenAI, 91% lower p95 latency, 90% token cost savings), Mem0 is the system to beat on quantitative metrics. Its graph memory variant adds entity relationship tracking.

GBrain's specific contribution: The "compiled truth + timeline" page model is genuinely novel. Each entity page has a rewritable top section (current understanding) and an append-only bottom section (evidence trail with dates). This separates the compiled state from the raw evidence — you can always trace why the system believes something.

2. Retrieval Architectures

The Four Approaches

Vector-only (pgvector, Pinecone, Weaviate, Chroma)

Keyword-only (FTS5, Elasticsearch)

Hybrid (RRF + reranking)

Graph-based (Neo4j, GraphRAG, Graphiti)

What Luci Uses Today

Luci currently uses grep + glob (keyword search over markdown files). This is equivalent to basic keyword retrieval with no semantic matching and no ranking. It works because:

  1. The corpus is small (~200 memory files, ~50 vault files)
  2. Entity names are explicit and greppable
  3. MEMORY.md index provides a manual routing layer

The gap: as the corpus grows, grep won't surface semantically related memories that don't share exact keywords.

Recommendation for Luci

Add SQLite FTS5 as a second retrieval path. This is the highest-value, lowest-cost upgrade:

  • Zero new dependencies (SQLite is already used for mc.db, vault.db, email.db)
  • Sub-millisecond BM25 ranking
  • Can index all memory files, vault entries, email subjects, and WhatsApp messages
  • Trivial to implement: CREATE VIRTUAL TABLE memory_fts USING fts5(...)
  • Does NOT require a vector database, embedding model, or GPU

Skip vector search for now. At Luci's scale (<1,000 memory entries), FTS5 + the existing grep/glob is sufficient. Vector search adds value at 10K+ entries. Revisit if the corpus grows.

Skip graph memory. Entity relationships can be captured in compiled entity pages (markdown links) rather than a graph database. The maintenance overhead of Neo4j/Graphiti is not justified for a single-user agent.

3. Data Model Patterns

The Four Models

Markdown-as-source-of-truth (Luci's current model, Karpathy's LLM Wiki, Claude Code)

Append-only event logs (MemGPT/Letta)

Compiled entity pages (GBrain, Karpathy Wiki)

Conversation threads (OpenAI Assistants, Memoria)

What Luci Should Adopt

Keep markdown-as-source-of-truth. Add compiled entity pages as a second layer.

The compiled entity page pattern from GBrain/Karpathy is the single highest-value addition to Luci's memory. Here's why:

  1. Luci already has the raw data. vault.db has entities, edges, and file references. email.db has contacts. whatsapp-messages.db has conversation history. The data exists but isn't compiled.
  2. Entity pages solve the "who is X?" problem. When Elmar mentions "Stephan" or "Chazelle" or "NMG", Luci should be able to pull up a compiled page: who they are, recent interactions, key facts, contact details, linked projects.
  3. The compilation can run as a dream cycle. A nightly script scans vault.db, email.db, and whatsapp-messages.db, extracts entity mentions, and compiles/updates markdown pages in a ~/.claude/memory/entities/ directory.
  4. Markdown stays the source of truth. The compiled pages are derived artifacts — regenerable from the raw data. Git tracks changes. Elmar can inspect and correct.

4. Entity Extraction and Enrichment

Current State of the Art

Who Does It Well

What Luci Should Do

Use Claude itself for entity extraction during the dream cycle. The pipeline:

  1. Scan recent activity (email.db, whatsapp-messages.db, vault.db activity_log)
  2. Extract entities: people, organizations, projects, with context
  3. For each entity, check if a compiled page exists
  4. If yes: update the compiled truth section, append to evidence trail
  5. If no: create a new entity page from accumulated evidence
  6. Cross-link entity pages (person → project, person → organization)

This is Tier 2 work — clear requirements, known patterns. The dream cycle script could be a Python scheduler task that calls Claude via the API for extraction.

5. Dream Cycles / Overnight Consolidation

Who Does It

SystemWhat HappensWhenMeasured Impact
Stanford Generative AgentsReflection: synthesize observations into abstract statementsInline (importance threshold)"Single biggest contributor to believable behavior"
MemGPT/Letta Sleep-time ComputePre-process context, anticipate queries, pre-compute reasoningBetween sessions (idle time)~5x compute reduction, up to 13% accuracy improvement
GBrainScan conversations, enrich entities, fix citations, consolidateOvernight cronNot formally measured
MemoryBankEbbinghaus forgetting curves — decay unaccessed memoriesContinuousPersonality adaptation over time
A-MEMMemory evolution — new memories trigger updates to old onesOn new memory arrivalSOTA on 6 models
Claude Code Auto-DreamPrune, merge, refresh memory filesIdle timeNot formally measured

The Sleep-time Compute Paper (Lin et al., 2025)

This is the strongest academic justification for dream cycles. Key findings:

The insight: decompose the prompt into static context (pre-processable between sessions) and dynamic query (real-time). Sleep-time compute enriches the static context during idle periods.

What Luci Should Build

A lightweight dream cycle as a scheduler task (nightly or every 6 hours):

Phase 1 — Scan: Read recent activity_log entries, new emails, new WhatsApp messages, completed ticket work.

Phase 2 — Extract: Use Claude API to extract entities, facts, and relationships from new data.

Phase 3 — Compile: Update entity pages in ~/.claude/memory/entities/. Update/merge existing memory files. Flag contradictions for human review.

Phase 4 — Prune: Apply forgetting curves — memories not accessed in 30+ days get moved to an archive. Stale project memories (completed projects) get marked as historical.

Phase 5 — Index: Rebuild FTS5 index from all memory files. Update MEMORY.md index.

Phase 6 — Report: Log what changed to activity_log. Optionally send a Telegram summary: "Dream cycle complete: 3 entity pages updated, 2 memories pruned, 1 contradiction flagged."

This is the GBrain dream cycle pattern adapted for Luci's infrastructure — no Postgres, no pgvector, just markdown + SQLite + Claude API.

6. Academic Foundations

The Three Essential Papers

1. Generative Agents (Park et al., Stanford, 2023)

2. Sleep-time Compute (Lin et al., Letta, 2025)

3. Episodic Memory Position Paper (Pink et al., 2025)

Supporting Papers

7. Expert Commentary and Contested Questions

Where Experts Agree

  1. Memory is essential for agents — every named expert agrees on this
  2. Human-readable, inspectable memory is preferable — Karpathy, Willison, Chase, Anthropic all converge
  3. File systems are a legitimate memory substrate — Chase: "give it access to a file system"; three independent projects converged on markdown
  4. Consolidation/dream cycles add measurable value — Stanford reflections, Letta sleep-time compute, GBrain dream cycles

Where Experts Disagree

RAG vs Long Context

PositionExpert
RAG is dead — use compiled markdown wikisKarpathy
RAG is NOT dead — economics and context rotHamel Husain
Context expands to fill limits — RAG stays relevantChip Huyen
RAG is a good hack, fine-tuning may matter more long-termJerry Liu
Long inputs > short prompts, but keep infra simpleSimon Willison

Vector DBs — Necessary?

PositionExpert
Single-vector embeddings lose critical info; use ColBERTHamel Husain
No vector DB needed — grep, file trees work better for codeMindStudio analysis
Faceted search with structured extraction > top-k vectorJason Liu
Hybrid is fine, optional not requiredKarpathy

Graph Memory — Worth the Overhead?

PositionExpert
Graph databases unnecessary — CSV or Postgres sufficeHamel Husain
Significant operational overhead, not every query benefitsMem0 analysis
Graph adds value for multi-hop entity reasoningNeo4j/Graphiti advocates
Zep/Graphiti: 94.8% DMR, 18.5% accuracy improvementRasmussen et al.

The Contrarian Case: "Just Use Markdown Files"

Three strong arguments:

  1. Cost: $0.02/GB/month local disk vs $50-200/GB managed vector DBs
  2. Inspectability: Human can read, edit, correct. No black-box embeddings.
  3. Convergence: Claude Code, GBrain, Manus all independently chose markdown. This is a strong signal.

Counter-arguments:

Production Failure Lessons

8. Tradeoffs Matrix for Luci

Criterion Keep Current Markdown Add Entity Pages + Dream Cycle Adopt GBrain Adopt Mem0 Build Custom (pgvector + graph)
Setup costZero (already working)Low (Python script + scheduler task)Medium (Postgres, PGLite, or Supabase)Medium (API integration or self-host)High (Postgres + pgvector + schema)
Retrieval latency<1ms (grep)<1ms (grep + FTS5)5-50ms (hybrid pgvector + FTS)~1s median (hosted), faster self-hosted5-50ms
Recall qualityGood for exact, poor for semanticGood for exact + BM25 rankedVery good (hybrid + RRF)Best measured (26% over OpenAI)Depends on implementation
MaintenanceManual MEMORY.md updatesDream cycle automates mostMust maintain Postgres + PGLiteAPI dependency or self-host complexityHigh ongoing effort
Integration with Claude CodeNative (Read/Write/Grep tools)Native + scheduled taskRequires MCP bridge or API wrapperRequires API integrationCustom MCP server
InspectabilityExcellent (plain markdown)Excellent (still markdown)Good (markdown + Postgres)Poor (opaque memory store)Depends on implementation
Entity trackingManual memory filesAuto-compiled entity pagesBuilt-in (compiled truth + timeline)Built-in (graph memory)Custom entity extraction
Dream cyclesNoneNightly consolidation scriptBuilt-in (overnight crons)Not documentedCustom implementation
Portabilitygit clonegit clonePostgres dependencyAPI dependencyPostgres dependency
Elmar can editYes (markdown)Yes (markdown)Partially (markdown, not Postgres)NoPartially

9. Decision Framework

The Four Options

Option A: Keep current markdown approach (do nothing)

Option B: Add entity pages + dream cycle + FTS5 (recommended)

Option C: Adopt GBrain prototype

Option D: Adopt a different pattern (Mem0, Letta, etc.)

Recommendation: Option B — Enhanced Markdown

The evidence strongly supports keeping markdown as the foundation and adding three specific capabilities:

  1. Compiled entity pages (steal from GBrain's "compiled truth + timeline" pattern)
    • Auto-generated markdown pages for people, projects, and organizations
    • Top section: current understanding (rewritable)
    • Bottom section: evidence trail with dates (append-only)
    • Stored in ~/.claude/memory/entities/
  2. Dream cycle (steal from Letta's sleep-time compute + GBrain's overnight crons)
    • Nightly scheduler task
    • Scans recent activity across all data sources
    • Extracts entities, updates entity pages, prunes stale memories
    • Logs changes and optionally sends Telegram summary
  3. SQLite FTS5 index (steal from Willison's SQLite-everything philosophy)
    • Index all memory files, entity pages, and MEMORY.md
    • BM25-ranked search as a retrieval path alongside grep
    • Zero new dependencies

What NOT to build:

  • No vector database (not needed at Luci's scale)
  • No graph database (entity relationships captured in markdown links)
  • No external API dependencies (everything local)
  • No Postgres (SQLite is already the standard on Luci)

Patterns to Steal Without Adopting the Whole Stack

PatternSourceHow to Apply
Compiled truth + timeline pagesGBrainEntity page format with rewritable top + append-only bottom
Sleep-time compute / dream cyclesLetta paper + GBrain cronsNightly scheduler task for consolidation
Ebbinghaus forgetting curvesMemoryBankDecay weight on memories by last-access date
Memory evolutionA-MEMNew facts trigger updates to existing entity pages
ReflectionStanford Generative AgentsDream cycle synthesizes observations into higher-level insights
Zettelkasten linkingA-MEMEntity pages cross-link to related entities
Context engineeringAnthropicKeep memory lean — "smallest set of high-signal tokens"

10. Contested Questions — Where I (Luci) Stand

RAG vs long context? At Luci's scale (<1,000 memory entries), long context wins. The entire memory corpus fits in the 1M token window. But Chip Huyen's Context Expansion Law applies — as Luci grows, we'll need retrieval. FTS5 is the cheapest hedge.

Vector DBs necessary? No, not yet. pgvectorscale is impressive (471 QPS at 99% recall), but FTS5 at <1ms for keyword search covers Luci's needs. Revisit at 10K+ entries.

Graph memory? No. Entity relationships are better captured as markdown links between compiled entity pages. The overhead of Neo4j/Graphiti is not justified for a single-user agent.

GBrain? Respect the ideas, don't adopt the stack. GBrain's best concepts (compiled truth + timeline, dream cycles, hybrid retrieval) can be implemented on Luci's existing markdown + SQLite infrastructure without Postgres.

Is markdown enough? Yes, with enhancements. Karpathy, Willison, Chase, and Anthropic all validate this approach. The failures reported in production (Boschi 2026) are at scales (500+ files, multi-hop cross-file reasoning) that Luci hasn't hit yet. When we do, FTS5 + entity pages + dream cycles provide the safety net.

Sources

Academic Papers

  1. Park et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." arxiv.org/abs/2304.03442
  2. Packer et al. (2023). "MemGPT: Towards LLMs as Operating Systems." arxiv.org/abs/2310.08560
  3. Zhong et al. (2023). "MemoryBank: Enhancing Large Language Models with Long-Term Memory." arxiv.org/abs/2305.10250
  4. Wu et al. (2023). "Recursively Summarizing Enables Long-Term Dialogue Memory." arxiv.org/abs/2308.15022
  5. Xu et al. (2025). "A-MEM: Agentic Memory for LLM Agents." arxiv.org/abs/2502.12110
  6. Pink et al. (2025). "Episodic Memory is the Missing Piece for Long-Term LLM Agents." arxiv.org/abs/2502.06975
  7. Lin et al. (2025). "Sleep-time Compute: Beyond Inference Scaling at Test-time." arxiv.org/abs/2504.13171
  8. Chhikara et al. (2025). "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory." arxiv.org/abs/2504.19413
  9. Zhang et al. (2024). "A Survey on the Memory Mechanism of Large Language Model based Agents." arxiv.org/abs/2404.13501
  10. Hu et al. (2025). "Memory in the Age of AI Agents: A Survey." arxiv.org/abs/2512.13564
  11. Edge et al. (2024). "From Local to Global: A Graph RAG Approach." arxiv.org/abs/2404.16130
  12. Rasmussen et al. (2025). "Zep: A Temporal Knowledge Graph Architecture." arxiv.org/abs/2501.13956
  13. Li et al. (2025). "Long Context vs. RAG for LLMs." arxiv.org/abs/2501.01880

Frameworks & Tools

  1. GBrain. github.com/garrytan/gbrain
  2. Letta. github.com/letta-ai/letta
  3. Mem0. github.com/mem0ai/mem0
  4. LangMem. github.com/langchain-ai/langmem
  5. A-MEM. github.com/agiresearch/A-mem
  6. Graphiti (Zep). github.com/getzep/graphiti
  7. Microsoft GraphRAG. github.com/microsoft/graphrag
  8. MCP Knowledge Graph Memory Server. github.com/.../src/memory

Expert Commentary

  1. Karpathy. "LLM Wiki" gist. gist.github.com/karpathy/...
  2. Willison. "Things we learned about LLMs in 2024." simonw.substack.com/...
  3. Husain. "Stop Saying RAG Is Dead." hamel.dev/.../not_dead.html
  4. Jason Liu. "Beyond Chunks: Context Engineering." jxnl.co/.../facets-context-engineering
  5. Harrison Chase. Sequoia podcast on context engineering. sequoiacap.com/podcast/...
  6. Anthropic. "Effective context engineering for AI agents." anthropic.com/.../effective-context-engineering
  7. Mem0. "State of AI Agent Memory 2026." mem0.ai/blog/...
  8. Letta. "Agent Memory" deep-dive. letta.com/blog/agent-memory
  9. Chip Huyen. "Agents." huyenchip.com/.../agents.html
  10. Cognition. "Devin Annual Performance Review 2025." cognition.ai/blog/...

Technical References

  1. Alex Garcia. "Hybrid full-text search and vector search with SQLite." alexgarcia.xyz/blog/...
  2. RAGFlow. "2025 year-end review." ragflow.io/blog/...
  3. Guillaume Laforge. "Understanding RRF in Hybrid Search." glaforge.dev/posts/...
  4. "AI Agent Memory Management: When Markdown Files Are All You Need." dev.to/imaginex/...