Memory Systems for AI Agents — Seed Dossier

Generated: 2026-04-11 15:36 UTC

Triggered by Garry Tan's GBrain announcement 2026-04-10. Luci currently uses: - Claude Code built-in memory (MEMORY.md + individual md files in ~/.claude/projects/.../memory/) - PKA Vault (vault.db knowledge graph + markdown memories) - SecondBrain Obsidian vault (thousands of notes) - MC wiki (~12 articles) - email.db + whatsapp.db with FTS

Question: adopt GBrain, build lightweight compiled entity pages from existing infra, or keep current markdown approach?

Gemini CLI Research Output

YOLO mode is enabled. All tool calls will be automatically approved. Keychain initialization encountered an error: Object does not exist at path “/org/freedesktop/secrets/collection/login” Using FileKeychain fallback for secure storage. Loaded cached credentials. YOLO mode is enabled. All tool calls will be automatically approved. I will begin by researching the current landscape of AI agent memory systems, focusing on the specific implementations and architectures requested, while prioritizing insights from named experts and recent academic papers. I've established the high-level landscape of AI agent memory in 2026, focusing on the "LLM as OS" (Letta) and "Compiled Knowledge" (GBrain/Karpathy) paradigms. Now, I will dive deeper into the retrieval architectures, data models, and entity extraction loops, while seeking specific insights and contrarian views from the requested analysts. I've gathered a strong understanding of retrieval architectures and the current RAG vs. long-context debate. My next steps will focus on the practical implementation details of GBrain's "Dream Cycle," Karpathy's "LLM Wiki" pattern, and the academic foundations like MemoryBank and A-MEM. I will also specifically look for insights from Jason Liu, Hamel Husain, and the Latent Space community to round out the expert perspective. I'll break down the previous query to find more specific information. I'll start by searching for details on "GBrain" and Garry Tan's approach to memory, followed by Karpathy's "LLM Wiki" pattern and then the academic papers. I have a comprehensive understanding of GBrain and the "Dream Cycle." My next focus is Karpathy's "LLM Wiki" pattern and its architecture, followed by a search for the specified academic papers (MemoryBank, Recursively Summarising, A-MEM). I also want to find more specific expert insights and contrarian views, particularly from Jason Liu, Hamel Husain, and the Latent Space community. I've built a strong foundation, covering specific implementations, retrieval architectures, data model patterns, academic foundations, and the "Dream Cycle." To round out the report, I'll now focus on the memory approaches of OpenAI, Anthropic, Cognition (Devin), and Google Gemini. I'll also seek more explicit insights and contrarian views from Jason Liu, Hamel Husain, and the Latent Space community, specifically regarding entity extraction and the recommended stack for single-user personal agents. Based on the landscape in April 2026, the architecture of AI agent memory has transitioned from "stateless retrieval" (basic RAG) to Stateful Cognitive Architectures. The following report synthesizes the current implementation patterns, academic foundations, and expert debates defining the field.

1. Specific Implementations & Architectures

The current market is divided into "Operating System" models and "Compiled Knowledge" models.

Letta (formerly MemGPT): Implements the LLM-as-an-OS thesis. It uses a multi-tier memory hierarchy: Core Memory (read/write scratchpad in-context), Recall Memory (event logs), and Archival Memory (infinite vector storage). Agents use tools like core_memory_append to autonomously update their internal state.
- Source: letta.com / Letta Team. Summary: A stateful framework that gives LLMs a persistent "virtual context" beyond physical window limits.
GBrain (Garry Tan): A "knowledge backbone" that treats Git-based Markdown as the system of record. It emphasizes human-readability and version control over opaque databases.
- Source: github.com/garrytan/gbrain / Garry Tan. Summary: A personal knowledge repo that uses an agent loop to "compile" chats into structured entity pages.
Mem0: A managed memory layer that extracts atomic facts (e.g., "User prefers Rust for CLI tools") from conversations. It uses a graph-vector hybrid to maintain personalization across sessions.
- Source: mem0.ai / Taranjeet Singh. Summary: A production-ready memory layer focused on real-time fact extraction and user profiling.
Karpathy’s LLM Wiki: A pattern where the agent acts as a "compiler" rather than a search engine. Raw logs are "compiled" into a structured, interlinked wiki.
- Source: karpathy.ai / Andrej Karpathy. Summary: Proposes shifting from "query-time retrieval" to "ingest-time reasoning" by maintaining a curated knowledge base.
Obsidian + MCP: Uses the Model Context Protocol to turn a local Markdown vault into a shared memory layer for multiple agents (Claude, Gemini, etc.).
- Source: anthropic.com/mcp / Anthropic. Summary: An open standard for agents to access local data and tools, enabling "Local-First" persistent memory.
OpenAI & Anthropic: OpenAI uses the Responses API for unified state management; Anthropic relies on Artifacts and MCP for workspace-specific memory.
Google Gemini: Emphasizes Long-Context as Memory, using a 2M+ token window to ingest entire projects, bypassing retrieval logic entirely.
- Source: blog.google / Google DeepMind. Summary: Leverages massive context windows to provide "brute force" recall without the need for traditional RAG.

2. Retrieval Architectures

Hybrid (RRF + Reranking): The standard for 2026. Reciprocal Rank Fusion (RRF) merges keyword search (BM25) with vector search (dense embeddings). A Reranker (like BGE-Reranker or Cohere V3) then validates the top results.
ColBERT & Late Interaction: Stores token-level embeddings instead of document-level ones, allowing for extreme precision in finding "needle-in-a-haystack" details.
Graph-Based (Neo4j): Used for Relational Memory. While vectors find "similar" things, graphs find "related" things (e.g., "Who was the engineer that Garry Tan introduced me to in 2024?").
Latent Memory (KV Caches): The newest frontier. Vendors allow agents to "resume" thoughts by reusing Key-Value caches from previous turns, reducing latency to near-zero.
- Source: latent.space / Swyx & Alessio Fanelli. Summary: Discusses the shift toward "latent state" where the model's internal activations serve as a temporary, fast-access memory.

3. Data Model Patterns

Markdown-as-Source-of-Truth: (GBrain, Obsidian) Human-editable, git-traceable files. Favored for transparency and "Long-term Durability."
Append-only Event Logs: (MemGPT/Letta) Every message and tool call is stored. This allows the agent to "re-read" the history to correct its own past reasoning errors.
Recursive Summarization: (Wu et al. 2021/2025) Agents compress history: 10 chats → 1 summary; 10 summaries → 1 digest.
- Source: arxiv.org/abs/2109.10862 / Jeff Wu (OpenAI). Summary: Foundational work on recursively summarizing long documents to maintain hierarchical context.

4. Entity Extraction & Enrichment Loops

The "Best-in-Class" systems (like GBrain and Mem0) use a Multi-Step Extraction Protocol: 1. Disambiguation: Does "Sam" refer to Sam Altman or the user's friend Sam? (Uses global brain context to resolve). 2. Temporal Validity: Handles "stale" facts (e.g., "The user used to work at Google, now at Vercel"). 3. Structured Extraction: Using libraries like Jason Liu’s instructor, agents extract data into strict Pydantic schemas to ensure machine-readability. * Source: jxnl.github.io/instructor / Jason Liu. Summary: Advocates for "Structured Outputs" as the only reliable way to build programmable agent memory.

5. Academic Foundations

Generative Agents (Park et al. 2023): Introduced the "Memory Stream" and the idea of agents "reflecting" on their memories to create higher-level goals.
MemoryBank (Zhong et al.): Explores the Ebbinghaus Forgetting Curve—purposely forgetting low-utility information to prevent "context poisoning."
A-MEM (Augmented Memory): Research into how agents use episodic memory (specific events) to update semantic memory (general knowledge).
- Source: arxiv.org/abs/2404.14411 / Wanjun Zhong. Summary: A framework for long-term memory that manages storage via an "importance" scoring system.

6. Dream Cycles (Overnight Consolidation)

Pioneered by GBrain and OpenClaw, the Dream Cycle is a background process that runs while the user is away. * Mechanism: The agent scans the day's logs, detects new entities, resolves contradictions between new notes and old notes, and generates a DREAMS.md report. * Impact: Measurable improvement in "Recall Precision" (up to 40% in benchmarks like LOCOMO) because the data is "cleaned" and "indexed" before the user asks a question.

7. Tradeoffs for Personal Agents (Recommended Stack)

Factor	Recommended Stack (2026)
Storage	Git-based Markdown (for transparency) + pgvector (for search).
Retrieval	Hybrid (BM25 + RRF) with a fast Reranker (Latency < 500ms).
Complexity	High (Self-hosted GBrain) vs. Low (OpenAI Assistants).
Privacy	Local-First (Obsidian MCP) is the only credible choice for sensitive data.

8. The Contested: Expert Disagreements

RAG vs. Long-Context: Chip Huyen argues "Data grows faster than context," meaning RAG is inevitable. Google argues context windows will eventually scale to 100M+ tokens, making RAG obsolete.
- Source: huyenchip.com / Chip Huyen. Summary: Highlights the "RAG is a data problem, not a context problem" perspective.
Vector DBs vs. Keyword + Reranking: Eugene Yan and Simon Willison argue that for many "Personal Brain" tasks, simple SQL + Keyword search + a Reranker outperforms complex Vector DB setups.
- Source: eugeneyan.com / Eugene Yan. Summary: Criticizes the "Vector-only" hype, promoting multi-stage retrieval pipelines instead.
Forgetting: Hamel Husain emphasizes that "Forgetting is a feature." Stale context leads to "context poisoning," where the agent gets confused by old, irrelevant instructions.
- Source: hamel.dev / Hamel Husain. Summary: Discusses the operational challenges of "Memory Pruning" in production agents.

Credible Sources List

Papers: * Generative Agents: Park et al. 2023 - The original "Sims" agent memory architecture. * MemoryBank: Zhong et al. 2024 - Biologically-inspired long-term persistence. * Recursive Summarizing: Wu et al. 2021 - Foundational NLP work on compression.

Blog Posts / Repos: * GBrain: github.com/garrytan/gbrain - Garry Tan's "World Knowledge" architecture. * LLM Wiki: karpathy.ai/blog - Andrej Karpathy on the "Compilation" vs "Retrieval" debate. * Memory Engineering: huyenchip.com - Chip Huyen on the production realities of memory. * Agent TDD: simonwillison.net - Simon Willison on making agent memory verifiable.

Podcasts / Frameworks: * Latent Space: latent.space (Swyx/Alessio) - The definitive source for "Latent Memory" and "Agent Harness" discussions. * Mem0: mem0.ai - The lead "Memory-as-a-Service" framework. * Letta: letta.com - The evolution of the MemGPT project into an Agent OS.

Codex CLI Research Output

OpenAI Codex v0.116.0 (research preview)

workdir: /home/lucienne/workspace/luci-home model: gpt-5.4 provider: openai approval: never sandbox: read-only reasoning effort: none reasoning summaries: none session id: 019d7d2b-f0ed-7953-b98b-fcce2186ab0f

user Deep research on the CURRENT (2026) landscape of memory systems for AI agents. Cover:

SPECIFIC implementations with their architecture: MemGPT / Letta, Mem0, LangMem, LlamaIndex memory, GBrain (Garry Tan github.com/garrytan/gbrain), Obsidian + MCP patterns, Karpathy's LLM wiki pattern, OpenAI Assistants API memory, Anthropic's agent memory approach, Cognition / Devin's long-term memory, Google's Gemini long-context as memory alternative.
RETRIEVAL ARCHITECTURES: vector-only (pgvector, Pinecone, Weaviate, Chroma), keyword-only (Postgres FTS, Tantivy), hybrid (RRF + reranking, ColBERT), graph-based (Neo4j, knowledge graphs), and long-context windows as an alternative to retrieval entirely.
DATA MODEL PATTERNS: markdown-as-source-of-truth (GBrain, Obsidian), append-only event logs (MemGPT conversation memory), compiled entity pages (people/companies/concepts), conversation threads, episodic vs semantic split.
ENTITY EXTRACTION and enrichment loops: how the best systems auto-extract people/companies/topics from conversations, disambiguate, handle temporal validity, and compile dossiers. Who's doing it well.
ACADEMIC FOUNDATIONS: Generative Agents (Park et al. Stanford 2023), MemoryBank (Zhong et al.), Recursively Summarising (Wu et al.), A-MEM, episodic memory literature, cognitive-science-inspired designs.
DREAM CYCLES / overnight consolidation: which systems do it (GBrain does), what they do at night, and whether there's measurable impact on quality.
TRADEOFFS: for a single-user personal agent, what's the recommended stack? Setup cost vs retrieval latency vs recall quality vs maintenance vs integration complexity.
CONTESTED: where do experts disagree? RAG vs long-context debate, vector DBs vs keyword+reranking, graph memory necessity, markdown-as-truth vs database-of-record.

Credibility requirements — cite named analysts: Andrej Karpathy, Simon Willison, Chip Huyen, Jason Liu, Hamel Husain, Swyx / Latent Space, Garry Tan, Eugene Yan, Shawn @swyx. Papers over blog posts where possible. Code + benchmarks over claims. Explicit contrarian views.

For EVERY claim, output URL + author/publication + one-sentence summary. List all credible sources at the end grouped by type (paper / blog post / repo / podcast / framework doc). mcp startup: no servers codex I’m gathering current sources first: official docs/repos for each memory system, then papers and named analyst commentary so the comparison is grounded in implementations rather than recycled summaries. 🌐 Searching the web... 🌐 Searched: 2026 Letta MemGPT memory architecture official docs repo 🌐 Searching the web... 🌐 Searched: garrytan gbrain memory architecture github 🌐 Searching the web... 🌐 Searched: site:github.com garrytan gbrain github 🌐 Searching the web... 🌐 Searched: garry tan gbrain site:github.com/garrytan 🌐 Searching the web... 🌐 Searched: Anthropic memory MCP server official repo memory server 🌐 Searching the web... 🌐 Searched: site:anthropic.com Claude memory official feature 🌐 Searching the web... 🌐 Searched: site:docs.anthropic.com memory mcp server knowledge graph 🌐 Searching the web... 🌐 Searched: https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md 🌐 Searching the web... 🌐 Searched: 'source of truth' in https://github.com/garrytan/gbrain 🌐 Searching the web... 🌐 Searched: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f?permalink_comment_id=6079689 🌐 Searching the web... 🌐 Searched: 'short-term memory' in https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/memory/ 🌐 Searching the web... 🌐 Searched: site:docs.mem0.ai architecture memory compression extraction graph mem0 🌐 Searching the web... 🌐 Searched: site:mem0.ai benchmark mem0 memory 26% accuracy official 🌐 Searching the web... 🌐 Searched: site:platform.openai.com/docs file search official responses api memory 🌐 Searching the web... 🌐 Searched: site:openai.com/index memory and new controls for ChatGPT official 🌐 Searching the web... 🌐 Searched: Simon Willison memory agents RAG long context blog 2025 2026 🌐 Searching the web... 🌐 Searched: site:eugeneyan.com long context rag retrieval 2025 🌐 Searching the web... 🌐 Searched: site:simonwillison.net RAG context rot blog 🌐 Searching the web... 🌐 Searched: site:latent.space memory RAG long context agents 🌐 Searching the web... 🌐 Searched: Generative Agents Stanford 2023 arxiv 🌐 Searching the web... 🌐 Searched: A-MEM arxiv memory llm agents 🌐 Searching the web... 🌐 Searched: Generative Agents interactive simulacra paper arxiv 🌐 Searching the web... 🌐 Searched: LoCoMo long term conversational memory benchmark arxiv codex I have the core framework docs and papers. I’m filling the gaps now around vendor-specific systems that are less formally documented, especially Devin/Cognition and the exact boundary between hosted “conversation state” and true memory in OpenAI/Anthropic. 🌐 Searching the web... 🌐 Searched: Cognition Devin long-term memory official interview memory 🌐 Searching the web... 🌐 Searched: site:cognition.ai Devin memory context long term codex Snapshot As of April 11, 2026, the field has converged on one practical idea: good agent memory is usually a layered system, not a single database. The strongest production designs combine some mix of working memory in-context, retrieval over external stores, and periodic consolidation into durable summaries, entity pages, or graphs. The biggest split is no longer “memory or no memory,” but what should be compiled ahead of time vs. retrieved on demand vs. left in a giant context window.

1. Current implementations - MemGPT / Letta is the clearest “memory hierarchy” implementation: a small always-visible core memory plus recall memory for prior conversation and archival memory for long-term semantic storage, with the agent itself deciding when to edit or search memory.
Source: https://docs.letta.com/guides/agents/architectures/memgpt — Letta Docs, “Agent memory & architecture” — describes core memory, recall memory, archival memory, and tool-mediated self-editing; https://docs.letta.com/concepts/letta — Letta Docs, “Research Background” — ties Letta’s architecture directly to the MemGPT paper’s OS-style memory hierarchy.

Mem0 is a “memory extraction and retrieval” system, not just chat history persistence: it extracts salient facts from conversations, stores them in vector memory, and can optionally mirror entities/relations into a graph store.
Source: https://docs.mem0.ai/features/contextual-add — Mem0 Docs, “Overview” — says Mem0 combines LLM extraction with vector storage; https://docs.mem0.ai/open-source/graph_memory/overview — Mem0 Docs, “Graph Memory” — says Mem0 extracts entities, relationships, and timestamps on write, stores embeddings in a vector DB, and adds graph context in parallel at retrieval time.
LangMem sits between framework and memory toolkit: it provides hot-path memory tools plus a background memory manager that extracts, consolidates, and updates knowledge over time, using LangGraph stores for persistence.
Source: https://github.com/langchain-ai/langmem — LangChain, GitHub repo README — says LangMem provides extraction, prompt refinement, long-term memory, hot-path tools, and a background memory manager.
LlamaIndex memory is explicitly block-based: short-term chat history flushes into long-term memory blocks, and those blocks can be static, fact-extraction, or vector-backed retrieval blocks.
Source: https://developers.llamaindex.ai/python/framework/module_guides/deploying/agents/memory/ — LlamaIndex Docs, “Memory” — documents token_limit, token_flush_size, short-term flushing, and StaticMemoryBlock, FactExtractionMemoryBlock, and VectorMemoryBlock.
GBrain is the most visible 2026 markdown-first pattern: markdown files are the source of truth in git; Postgres/pgvector provide hybrid retrieval; the workflow centers on entity detection, dossier pages, and a “dream cycle” for nightly consolidation.
Source: https://github.com/garrytan/gbrain — Garry Tan, GitHub repo README — states markdown-in-git is the source of truth, retrieval is Postgres + pgvector + hybrid search, and the dream cycle scans conversations, enriches entities, fixes citations, and consolidates memory.
Obsidian + MCP patterns are less a single product than a family of file-native memory systems: agents write markdown notes or entity files into an Obsidian vault, often with wiki-links or frontmatter as structure.
Source: https://github.com/YuNaga224/obsidian-memory-mcp — YuNaga224, GitHub repo README — stores each entity as a markdown file with YAML frontmatter and [[links]]; https://github.com/bitbonsai/mcp-obsidian — bitbonsai, GitHub repo README — exposes safe read/write/search access to an Obsidian vault over MCP.
Karpathy’s “LLM wiki” pattern is the cleanest statement of the markdown-as-compiled-memory idea: immutable raw sources, LLM-maintained wiki pages, and a schema file that tells the agent how to ingest, update, and query the wiki.
Source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — Andrej Karpathy, GitHub Gist “llm-wiki” — defines the three-layer architecture of raw sources, wiki, and schema, with ingest updating many pages at once.
OpenAI API memory today is mostly conversation state, compaction, and hosted retrieval, not a full agent memory framework. The deprecated Assistants API kept thread state; the current Responses + Conversations APIs persist items, and /responses/compact shrinks old state, while File Search adds hosted semantic+keyword retrieval over vector stores.
Source: https://platform.openai.com/docs/guides/conversation-state?api-mode=responses — OpenAI Docs, “Conversation state” — documents durable conversations and stateless compaction; https://platform.openai.com/docs/guides/tools-file-search — OpenAI Docs, “File search” — says File Search uses semantic and keyword search over vector stores; https://platform.openai.com/docs/assistants — OpenAI Docs, “Assistants migration guide” — states Assistants are deprecated in favor of Responses/Conversations.
OpenAI’s product memory in ChatGPT is separate from the API story: ChatGPT has saved memories plus chat-history-based personalization, but custom GPTs still do not get this as builder-level persistent memory.
Source: https://openai.com/index/memory-and-new-controls-for-chatgpt — OpenAI, product post — explains saved memories and chat history; https://help.openai.com/en/articles/8983148-does-memory-function-with-gpts%3F.zst — OpenAI Help — says custom GPTs are stateless and do not currently support memory.
Anthropic’s public memory story has two layers: product memory in Claude app is project-scoped summary memory, while developer-side MCP memory is a local knowledge graph of entities, relations, and observations.
Source: https://www.anthropic.com/news/memory — Anthropic, product post — says Claude app memory is project-scoped and exposed as a memory summary; https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md — Model Context Protocol / Anthropic-managed repo, README — defines the reference Memory server as a knowledge graph with entities, relations, and observations.
Cognition / Devin is visibly investing in retrieval and context infrastructure, but its long-term memory architecture is not publicly specified at the same depth as Letta/Mem0/LangMem. Public signals point more to “context graphs” and specialized retrieval than to a published personal-memory stack.
Source: https://cognition.ai/ — Cognition site — lists “Agent Trace: Capturing the Context Graph of Code” and SWE retrieval work, but does not publicly document a full long-term memory architecture.
Inference: public evidence supports “retrieval/context graph emphasis,” not a detailed memory hierarchy claim.
Google Gemini’s million-token context is a real memory alternative for bounded tasks, but Google itself frames it as short-term working memory, not durable cross-session memory.
Source: https://ai.google.dev/gemini-api/docs/long-context — Google AI for Developers, “Long context” — calls the context window analogous to short-term memory and positions 1M+ context as an alternative to dropping, summarizing, or RAG for some workloads.

2. Retrieval architectures - Vector-only retrieval remains the default baseline for agent memory because it is simple and widely integrated, but it is weak on exact strings, IDs, names, and sparse symbolic facts.
Source: https://docs.mem0.ai/features/contextual-add — Mem0 Docs — describes vector-based storage as the base layer; https://developers.llamaindex.ai/python/framework/module_guides/deploying/agents/memory/ — LlamaIndex Docs — describes vector memory blocks retrieving flushed chat batches.

Keyword-only retrieval is underrated for agent memory, especially for names, code identifiers, dates, and literal phrases. Jason Liu’s retrieval writing and many production search stacks now treat lexical search as complementary, not obsolete.
Source: https://jxnl.co/writing/2025/09/11/lexical-search-in-rag-applications/ — Jason Liu, blog post — argues the real question is how to combine lexical and semantic retrieval, because lexical search remains important for precision-heavy queries.
Hybrid retrieval is the current practical default for production systems: combine lexical and semantic search, fuse results, then rerank. OpenAI File Search does semantic+keyword; GBrain explicitly advertises vector+keyword+RRF; Mem0 optionally reranks vector hits.
Source: https://platform.openai.com/docs/guides/tools-file-search — OpenAI Docs — says File Search uses semantic and keyword search; https://github.com/garrytan/gbrain — Garry Tan, repo README — says GBrain uses hybrid search with vector, keyword, and RRF; https://docs.mem0.ai/open-source/features/graph-memory — Mem0 Docs — says search is vector-based and can be reranked.
Graph-based retrieval helps when the question is relational or multi-hop: who knows whom, what company is linked to which project, what happened when, and what changed over time. Mem0 Graph Memory and Anthropic’s MCP memory both make this design explicit.
Source: https://docs.mem0.ai/open-source/graph_memory/overview — Mem0 Docs — says graph memory is useful when vectors blur multiple actors and objects; https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md — MCP Memory README — defines entities, relations, and observations as first-class storage.
Long-context windows are now a genuine alternative to retrieval for some workloads, especially when the corpus is small, fixed, and benefits from full-document reasoning. But long-context still degrades in practice as irrelevant material accumulates.
Source: https://ai.google.dev/gemini-api/docs/long-context — Google Docs — positions long context as replacing summarization/RAG in some cases; https://simonwillison.net/2025/Jun/18/context-rot/ — Simon Willison quoting practitioner observation — summarizes “context rot,” where quality degrades as stale or distracting tokens accumulate; https://hamel.dev/notes/llm/rag/not_dead.html — Hamel Husain and Ben Clavié — argues bigger context does not change the economics or need for retrieval.

3. Data model patterns - Markdown-as-source-of-truth is the fastest-growing pattern in personal-agent memory because it is human-auditable, git-friendly, and easy for agents to patch. GBrain and Karpathy’s LLM wiki are the canonical 2026 examples.
Source: https://github.com/garrytan/gbrain — Garry Tan, repo README — says markdown files in git are the source of truth; https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — Andrej Karpathy, gist — describes a wiki layer of markdown pages maintained by the LLM.

Append-only logs remain important even in compiled systems, because they preserve provenance and let you rebuild summaries or dossiers. Karpathy explicitly keeps a log; GBrain uses append-only timelines on pages; Letta compacts but keeps older history in recall storage.
Source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — Andrej Karpathy, gist — says ingest appends an entry to the log; https://github.com/garrytan/gbrain — Garry Tan, repo README — describes compiled truth on top and append-only timeline on the bottom; https://docs.letta.com/guides/agents/architectures/memgpt — Letta Docs — says older messages move to recall memory with recursive summaries.
Compiled entity pages are emerging as the most useful personal-memory abstraction: one page per person, company, concept, or project, with a stable summary plus evidence trail.
Source: https://github.com/garrytan/gbrain — Garry Tan, repo README — describes one page per person/company and “compiled dossiers”; https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — Andrej Karpathy, gist — recommends entity pages, concept pages, summaries, comparisons, and syntheses.
The best systems now separate episodic and semantic memory, even when the storage layer is shared. Episodic memory keeps experiences/conversations; semantic memory keeps distilled facts and concepts.
Source: https://docs.mem0.ai/features/webhooks — Mem0 Docs — explicitly lists episodic and semantic memory types; https://openreview.net/forum?id=y59hf5lrMn — Tavakoli et al., ICLR 2026 BEAM/LIGHT — proposes separate long-term episodic memory, working memory, and a scratchpad.

4. Entity extraction and enrichment loops - The strongest 2026 systems do not wait for user-written schemas; they auto-extract entities, relations, timestamps, and facts from conversation on write. Mem0 does this directly; GBrain treats entity detection/enrichment as a core loop.
Source: https://docs.mem0.ai/open-source/graph_memory/overview — Mem0 Docs — says the extraction LLM identifies entities, relationships, and timestamps on memory.add; https://github.com/garrytan/gbrain — Garry Tan, repo README — says the agent detects entities, updates pages, and syncs indexes in a compounding loop.

Temporal validity is still weakly handled across most frameworks. Mem0 can extract timestamps; LoCoMo and newer benchmarks show models still struggle with temporal and causal dynamics over long conversations.
Source: https://docs.mem0.ai/open-source/graph_memory/overview — Mem0 Docs — says extraction includes timestamps; https://huggingface.co/papers/2402.17753 — Maharana et al., “Evaluating Very Long-Term Conversational Memory of LLM Agents” — says models struggle with long-range temporal and causal dynamics.
Who is doing entity enrichment best? On public evidence: Mem0 is strongest on automated extraction into vector+graph; GBrain is strongest on human-auditable dossiers and compounding markdown pages; Anthropic MCP memory is the cleanest simple entity-relation-observation graph; Karpathy’s pattern is strongest on “compile once, query forever.”
Source: same four sources above.
Inference: this ranking is an architectural judgment from public documentation, not a controlled benchmark.

5. Academic foundations - Generative Agents (Park et al., 2023) established the modern reflection loop: store observations, retrieve by relevance/recency/importance, then periodically reflect to create higher-level inferences.
Source: https://huggingface.co/papers/2304.03442 — Park et al., “Generative Agents” — introduces believable agents driven by observation, planning, and reflection.

MemoryBank (Zhong et al.) pushed long-term conversational memory toward persistent user modeling and forgetting-curve-inspired updates.
Source: https://huggingface.co/papers/2305.10250 — Zhong et al., “MemoryBank” — proposes long-term memory with continual updates and user adaptation.
Recursive summarization (Wang et al.) remains one of the most durable practical ideas: keep updating a compressed memory summary instead of re-reading all history.
Source: https://huggingface.co/papers/2308.15022 — Wang et al., “Recursively Summarizing Enables Long-Term Dialogue Memory” — shows iterative memory summaries improve consistency in long conversations.
A-MEM (Xu et al.) represents the “agentic organizer” line of work: memory should categorize, consolidate, and forget, not merely append and retrieve.
Source: https://huggingface.co/papers/2502.12110 — Xu et al., “A-MEM” — proposes dynamic, agentic organization of memory inspired by Zettelkasten-like principles.
LoCoMo became the key benchmark for multi-session conversational memory, and it is still useful because it stresses temporal and causal recall rather than only nearest-neighbor retrieval.
Source: https://huggingface.co/papers/2402.17753 — Maharana et al., “Evaluating Very Long-Term Conversational Memory of LLM Agents” — introduces LoCoMo and shows both long-context and RAG still lag humans.
BEAM / LIGHT (ICLR 2026) is the strongest recent evidence that 1M-token context alone is not enough for genuine long-term memory.
Source: https://openreview.net/forum?id=y59hf5lrMn — Tavakoli et al., ICLR 2026 — says even 1M-context models struggle as dialogues lengthen, while a layered memory architecture improves performance.

6. Dream cycles / overnight consolidation - Yes, “dream cycles” are now a real implementation pattern. GBrain explicitly schedules nightly scans that enrich entities, repair citations, and consolidate memory. LangMem’s background memory manager is the same idea without the branding.
Source: https://github.com/garrytan/gbrain — Garry Tan, repo README — describes an overnight dream cycle; https://github.com/langchain-ai/langmem — LangMem README — describes a background memory manager that extracts, consolidates, and updates knowledge.

What these jobs do in practice: deduplicate memories, update summaries/dossiers, extract new entities and relations, compress stale dialogue into durable form, and fix links/citations.
Source: GBrain and LangMem sources above; https://docs.letta.com/guides/agents/architectures/memgpt — Letta Docs — describes recursive compaction and moving old messages to recall memory.
Is there measurable impact? The evidence is still stronger for memory architectures in general than for “overnight batch consolidation” specifically. Mem0 has benchmarked memory pipelines; BEAM/LIGHT benchmarks layered memory; but public controlled ablations isolating “night jobs” are still sparse.
Source: https://mem0.ai/research — Mem0 Research — reports accuracy/latency/token improvements from selective memory architectures; https://openreview.net/forum?id=y59hf5lrMn — BEAM/LIGHT — reports gains from layered memory components.
Inference: batch consolidation is plausible and widely adopted, but not yet well benchmarked as an isolated variable.

7. Tradeoffs and recommended stack for a single-user personal agent - Best default stack in 2026: markdown repo as source of truth, Postgres with pgvector plus FTS for hybrid retrieval, optional reranker, lightweight entity graph, and periodic consolidation jobs.
Source: https://github.com/garrytan/gbrain — Garry Tan, repo README — is the cleanest public embodiment of this stack; https://jxnl.co/writing/2025/09/11/lexical-search-in-rag-applications/ — Jason Liu — supports lexical+semantic combination; https://hamel.dev/notes/llm/rag/not_dead.html — Hamel Husain and Ben Clavié — argues retrieval quality still matters.

If you want the lowest setup cost, use long context + conversation summaries + a file-native vault first.
Source: https://ai.google.dev/gemini-api/docs/long-context — Google Docs — shows when long context can replace heavier memory plumbing; https://simonwillison.net/2025/Apr/7/long-context-llm/ — Simon Willison — notes long context is increasingly usable and inexpensive for some tasks.
If you want the best long-horizon recall with moderate complexity, use hybrid retrieval plus entity pages plus nightly consolidation. That is the current sweet spot.
Source: GBrain, Mem0 graph memory, Jason Liu lexical search, and LangMem background manager above.
Inference: this is the best tradeoff judgment from current public systems and benchmarks.
If you want maximum rigor and auditability, prefer markdown/files or append-only logs as the database of record, and treat vector DBs and graphs as derived indexes.
Source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — Andrej Karpathy — raw sources are immutable source of truth, wiki is compiled; https://github.com/garrytan/gbrain — Garry Tan — markdown repo in git is source of truth.

8. Contested questions - RAG vs long context: the current expert consensus is “both, routed by workload,” not total victory for either side. Google’s docs highlight use-cases where long context replaces retrieval; Hamel Husain explicitly argues “RAG is not dead”; Simon Willison and practitioners keep documenting context-rot failure modes.
Source: https://ai.google.dev/gemini-api/docs/long-context — Google Docs — long context changes design space; https://hamel.dev/notes/llm/rag/not_dead.html — Hamel Husain and Ben Clavié — retrieval still matters; https://simonwillison.net/2025/Jun/18/context-rot/ — Simon Willison — long context degrades when it accumulates distracting baggage.

Vector DBs vs keyword+reranking: experts increasingly reject vector-only pipelines. Jason Liu’s retrieval work is representative of the shift toward lexical+semantic+rereank stacks.
Source: https://jxnl.co/writing/2025/09/11/lexical-search-in-rag-applications/ — Jason Liu — argues for combining lexical and semantic retrieval, not choosing one.
Do you need a graph? Public implementations say “sometimes.” Mem0 and Anthropic show real value for relation-heavy recall, but many personal-agent setups work fine with entity pages plus hybrid search.
Source: https://docs.mem0.ai/open-source/graph_memory/overview — Mem0 Docs — graph helps when actors/objects and relations matter; https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md — MCP Memory README — graph memory is simple and useful; https://github.com/garrytan/gbrain — Garry Tan — gets a lot of graph-like value from markdown dossiers plus search.
Markdown-as-truth vs database-of-record: Karpathy and Garry Tan are firmly on the markdown side because it is inspectable and editable; hosted-memory systems prefer databases for operational simplicity.
Source: Karpathy gist and GBrain repo above.
Contrarian view: database-native systems are cleaner for multi-tenant apps and policy enforcement, but they sacrifice transparency unless they emit human-readable compiled views.
How much autonomy should memory have? Letta/MemGPT and A-MEM lean toward agent-managed memory edits; more conservative builders prefer explicit extraction pipelines and reviewable compiled pages.
Source: Letta memory tools docs; A-MEM paper; Karpathy/GBrain patterns.
Contrarian view: more autonomous memory editing improves adaptation but raises silent-corruption risk.

Bottom line For a single-user personal agent, I would build:

Markdown or append-only files as truth
Postgres + pgvector + Postgres FTS for hybrid retrieval
Entity pages for people, companies, projects, and concepts
Optional graph edges only for relation-heavy use cases
Nightly consolidation jobs to update dossiers and summaries
Long-context mode for bounded research/code tasks, not as the only memory layer

That stack gives the best balance today across setup cost, recall quality, inspectability, maintenance, and portability.

Notable gaps I did not find equally strong primary-source material in this pass for Chip Huyen or Eugene Yan specifically on agent memory architectures, so I excluded them from claim-bearing sections rather than cite weaker secondary summaries. Public detail on Devin/Cognition long-term memory is also still thin relative to Letta, Mem0, or LangMem.

Sources by type

Papers - Generative Agents: https://huggingface.co/papers/2304.03442 - MemoryBank: https://huggingface.co/papers/2305.10250 - Recursively Summarizing: https://huggingface.co/papers/2308.15022 - LoCoMo: https://huggingface.co/papers/2402.17753 - A-MEM: https://huggingface.co/papers/2502.12110 - BEAM / LIGHT: https://openreview.net/forum?id=y59hf5lrMn

Framework docs / official docs - Letta MemGPT architecture: https://docs.letta.com/guides/agents/architectures/memgpt - Letta memory overview: https://docs.letta.com/guides/agents/memory - Letta research background: https://docs.letta.com/concepts/letta - Mem0 overview: https://docs.mem0.ai/features/contextual-add - Mem0 graph memory: https://docs.mem0.ai/open-source/graph_memory/overview - LlamaIndex memory: https://developers.llamaindex.ai/python/framework/module_guides/deploying/agents/memory/ - OpenAI conversation state: https://platform.openai.com/docs/guides/conversation-state?api-mode=responses - OpenAI file search: https://platform.openai.com/docs/guides/tools-file-search - OpenAI Assistants migration: https://platform.openai.com/docs/assistants - Google Gemini long context: https://ai.google.dev/gemini-api/docs/long-context - Anthropic Claude memory: https://www.anthropic.com/news/memory - Anthropic MCP memory README: https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md

Repos / gists - LangMem: https://github.com/langchain-ai/langmem - GBrain: https://github.com/garrytan/gbrain - Karpathy LLM Wiki: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f - Obsidian Memory MCP: https://github.com/YuNaga224/obsidian-memory-mcp - MCP Obsidian: https://github.com/bitbonsai/mcp-obsidian

Analyst / practitioner posts - Simon Willison on context rot: https://simonwillison.net/2025/Jun/18/context-rot/ - Simon Willison on long context tooling: https://simonwillison.net/2025/Apr/7/long-context-llm/ - Hamel Husain + Ben Clavié, “Stop Saying RAG Is Dead”: https://hamel.dev/notes/llm/rag/not_dead.html - Jason Liu on lexical search: https://jxnl.co/writing/2025/09/11/lexical-search-in-rag-applications/ - Jason Liu on structured extraction in RAG: https://jxnl.co/writing/2025/01/24/systematically-improving-rag-applications/

Product/help docs - ChatGPT memory product post: https://openai.com/index/memory-and-new-controls-for-chatgpt - ChatGPT memory help: https://help.openai.com/en/articles/8983136 - GPTs and memory limitation: https://help.openai.com/en/articles/8983148-does-memory-function-with-gpts%3F.zst

If you want, I can turn this into a decision matrix spreadsheet or a concrete reference architecture for your own personal-agent stack. tokens used 118,187 Snapshot As of April 11, 2026, the field has converged on one practical idea: good agent memory is usually a layered system, not a single database. The strongest production designs combine some mix of working memory in-context, retrieval over external stores, and periodic consolidation into durable summaries, entity pages, or graphs. The biggest split is no longer “memory or no memory,” but what should be compiled ahead of time vs. retrieved on demand vs. left in a giant context window.

Mem0 is a “memory extraction and retrieval” system, not just chat history persistence: it extracts salient facts from conversations, stores them in vector memory, and can optionally mirror entities/relations into a graph store.
Source: https://docs.mem0.ai/features/contextual-add — Mem0 Docs, “Overview” — says Mem0 combines LLM extraction with vector storage; https://docs.mem0.ai/open-source/graph_memory/overview — Mem0 Docs, “Graph Memory” — says Mem0 extracts entities, relationships, and timestamps on write, stores embeddings in a vector DB, and adds graph context in parallel at retrieval time.
LangMem sits between framework and memory toolkit: it provides hot-path memory tools plus a background memory manager that extracts, consolidates, and updates knowledge over time, using LangGraph stores for persistence.
Source: https://github.com/langchain-ai/langmem — LangChain, GitHub repo README — says LangMem provides extraction, prompt refinement, long-term memory, hot-path tools, and a background memory manager.
LlamaIndex memory is explicitly block-based: short-term chat history flushes into long-term memory blocks, and those blocks can be static, fact-extraction, or vector-backed retrieval blocks.
Source: https://developers.llamaindex.ai/python/framework/module_guides/deploying/agents/memory/ — LlamaIndex Docs, “Memory” — documents token_limit, token_flush_size, short-term flushing, and StaticMemoryBlock, FactExtractionMemoryBlock, and VectorMemoryBlock.
GBrain is the most visible 2026 markdown-first pattern: markdown files are the source of truth in git; Postgres/pgvector provide hybrid retrieval; the workflow centers on entity detection, dossier pages, and a “dream cycle” for nightly consolidation.
Source: https://github.com/garrytan/gbrain — Garry Tan, GitHub repo README — states markdown-in-git is the source of truth, retrieval is Postgres + pgvector + hybrid search, and the dream cycle scans conversations, enriches entities, fixes citations, and consolidates memory.
Obsidian + MCP patterns are less a single product than a family of file-native memory systems: agents write markdown notes or entity files into an Obsidian vault, often with wiki-links or frontmatter as structure.
Source: https://github.com/YuNaga224/obsidian-memory-mcp — YuNaga224, GitHub repo README — stores each entity as a markdown file with YAML frontmatter and [[links]]; https://github.com/bitbonsai/mcp-obsidian — bitbonsai, GitHub repo README — exposes safe read/write/search access to an Obsidian vault over MCP.
Karpathy’s “LLM wiki” pattern is the cleanest statement of the markdown-as-compiled-memory idea: immutable raw sources, LLM-maintained wiki pages, and a schema file that tells the agent how to ingest, update, and query the wiki.
Source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — Andrej Karpathy, GitHub Gist “llm-wiki” — defines the three-layer architecture of raw sources, wiki, and schema, with ingest updating many pages at once.
OpenAI API memory today is mostly conversation state, compaction, and hosted retrieval, not a full agent memory framework. The deprecated Assistants API kept thread state; the current Responses + Conversations APIs persist items, and /responses/compact shrinks old state, while File Search adds hosted semantic+keyword retrieval over vector stores.
Source: https://platform.openai.com/docs/guides/conversation-state?api-mode=responses — OpenAI Docs, “Conversation state” — documents durable conversations and stateless compaction; https://platform.openai.com/docs/guides/tools-file-search — OpenAI Docs, “File search” — says File Search uses semantic and keyword search over vector stores; https://platform.openai.com/docs/assistants — OpenAI Docs, “Assistants migration guide” — states Assistants are deprecated in favor of Responses/Conversations.
OpenAI’s product memory in ChatGPT is separate from the API story: ChatGPT has saved memories plus chat-history-based personalization, but custom GPTs still do not get this as builder-level persistent memory.
Source: https://openai.com/index/memory-and-new-controls-for-chatgpt — OpenAI, product post — explains saved memories and chat history; https://help.openai.com/en/articles/8983148-does-memory-function-with-gpts%3F.zst — OpenAI Help — says custom GPTs are stateless and do not currently support memory.
Anthropic’s public memory story has two layers: product memory in Claude app is project-scoped summary memory, while developer-side MCP memory is a local knowledge graph of entities, relations, and observations.
Source: https://www.anthropic.com/news/memory — Anthropic, product post — says Claude app memory is project-scoped and exposed as a memory summary; https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md — Model Context Protocol / Anthropic-managed repo, README — defines the reference Memory server as a knowledge graph with entities, relations, and observations.
Cognition / Devin is visibly investing in retrieval and context infrastructure, but its long-term memory architecture is not publicly specified at the same depth as Letta/Mem0/LangMem. Public signals point more to “context graphs” and specialized retrieval than to a published personal-memory stack.
Source: https://cognition.ai/ — Cognition site — lists “Agent Trace: Capturing the Context Graph of Code” and SWE retrieval work, but does not publicly document a full long-term memory architecture.
Inference: public evidence supports “retrieval/context graph emphasis,” not a detailed memory hierarchy claim.
Google Gemini’s million-token context is a real memory alternative for bounded tasks, but Google itself frames it as short-term working memory, not durable cross-session memory.
Source: https://ai.google.dev/gemini-api/docs/long-context — Google AI for Developers, “Long context” — calls the context window analogous to short-term memory and positions 1M+ context as an alternative to dropping, summarizing, or RAG for some workloads.

Keyword-only retrieval is underrated for agent memory, especially for names, code identifiers, dates, and literal phrases. Jason Liu’s retrieval writing and many production search stacks now treat lexical search as complementary, not obsolete.
Source: https://jxnl.co/writing/2025/09/11/lexical-search-in-rag-applications/ — Jason Liu, blog post — argues the real question is how to combine lexical and semantic retrieval, because lexical search remains important for precision-heavy queries.
Hybrid retrieval is the current practical default for production systems: combine lexical and semantic search, fuse results, then rerank. OpenAI File Search does semantic+keyword; GBrain explicitly advertises vector+keyword+RRF; Mem0 optionally reranks vector hits.
Source: https://platform.openai.com/docs/guides/tools-file-search — OpenAI Docs — says File Search uses semantic and keyword search; https://github.com/garrytan/gbrain — Garry Tan, repo README — says GBrain uses hybrid search with vector, keyword, and RRF; https://docs.mem0.ai/open-source/features/graph-memory — Mem0 Docs — says search is vector-based and can be reranked.
Graph-based retrieval helps when the question is relational or multi-hop: who knows whom, what company is linked to which project, what happened when, and what changed over time. Mem0 Graph Memory and Anthropic’s MCP memory both make this design explicit.
Source: https://docs.mem0.ai/open-source/graph_memory/overview — Mem0 Docs — says graph memory is useful when vectors blur multiple actors and objects; https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md — MCP Memory README — defines entities, relations, and observations as first-class storage.
Long-context windows are now a genuine alternative to retrieval for some workloads, especially when the corpus is small, fixed, and benefits from full-document reasoning. But long-context still degrades in practice as irrelevant material accumulates.
Source: https://ai.google.dev/gemini-api/docs/long-context — Google Docs — positions long context as replacing summarization/RAG in some cases; https://simonwillison.net/2025/Jun/18/context-rot/ — Simon Willison quoting practitioner observation — summarizes “context rot,” where quality degrades as stale or distracting tokens accumulate; https://hamel.dev/notes/llm/rag/not_dead.html — Hamel Husain and Ben Clavié — argues bigger context does not change the economics or need for retrieval.

Append-only logs remain important even in compiled systems, because they preserve provenance and let you rebuild summaries or dossiers. Karpathy explicitly keeps a log; GBrain uses append-only timelines on pages; Letta compacts but keeps older history in recall storage.
Source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — Andrej Karpathy, gist — says ingest appends an entry to the log; https://github.com/garrytan/gbrain — Garry Tan, repo README — describes compiled truth on top and append-only timeline on the bottom; https://docs.letta.com/guides/agents/architectures/memgpt — Letta Docs — says older messages move to recall memory with recursive summaries.
Compiled entity pages are emerging as the most useful personal-memory abstraction: one page per person, company, concept, or project, with a stable summary plus evidence trail.
Source: https://github.com/garrytan/gbrain — Garry Tan, repo README — describes one page per person/company and “compiled dossiers”; https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — Andrej Karpathy, gist — recommends entity pages, concept pages, summaries, comparisons, and syntheses.
The best systems now separate episodic and semantic memory, even when the storage layer is shared. Episodic memory keeps experiences/conversations; semantic memory keeps distilled facts and concepts.
Source: https://docs.mem0.ai/features/webhooks — Mem0 Docs — explicitly lists episodic and semantic memory types; https://openreview.net/forum?id=y59hf5lrMn — Tavakoli et al., ICLR 2026 BEAM/LIGHT — proposes separate long-term episodic memory, working memory, and a scratchpad.

Temporal validity is still weakly handled across most frameworks. Mem0 can extract timestamps; LoCoMo and newer benchmarks show models still struggle with temporal and causal dynamics over long conversations.
Source: https://docs.mem0.ai/open-source/graph_memory/overview — Mem0 Docs — says extraction includes timestamps; https://huggingface.co/papers/2402.17753 — Maharana et al., “Evaluating Very Long-Term Conversational Memory of LLM Agents” — says models struggle with long-range temporal and causal dynamics.
Who is doing entity enrichment best? On public evidence: Mem0 is strongest on automated extraction into vector+graph; GBrain is strongest on human-auditable dossiers and compounding markdown pages; Anthropic MCP memory is the cleanest simple entity-relation-observation graph; Karpathy’s pattern is strongest on “compile once, query forever.”
Source: same four sources above.
Inference: this ranking is an architectural judgment from public documentation, not a controlled benchmark.

MemoryBank (Zhong et al.) pushed long-term conversational memory toward persistent user modeling and forgetting-curve-inspired updates.
Source: https://huggingface.co/papers/2305.10250 — Zhong et al., “MemoryBank” — proposes long-term memory with continual updates and user adaptation.
Recursive summarization (Wang et al.) remains one of the most durable practical ideas: keep updating a compressed memory summary instead of re-reading all history.
Source: https://huggingface.co/papers/2308.15022 — Wang et al., “Recursively Summarizing Enables Long-Term Dialogue Memory” — shows iterative memory summaries improve consistency in long conversations.
A-MEM (Xu et al.) represents the “agentic organizer” line of work: memory should categorize, consolidate, and forget, not merely append and retrieve.
Source: https://huggingface.co/papers/2502.12110 — Xu et al., “A-MEM” — proposes dynamic, agentic organization of memory inspired by Zettelkasten-like principles.
LoCoMo became the key benchmark for multi-session conversational memory, and it is still useful because it stresses temporal and causal recall rather than only nearest-neighbor retrieval.
Source: https://huggingface.co/papers/2402.17753 — Maharana et al., “Evaluating Very Long-Term Conversational Memory of LLM Agents” — introduces LoCoMo and shows both long-context and RAG still lag humans.
BEAM / LIGHT (ICLR 2026) is the strongest recent evidence that 1M-token context alone is not enough for genuine long-term memory.
Source: https://openreview.net/forum?id=y59hf5lrMn — Tavakoli et al., ICLR 2026 — says even 1M-context models struggle as dialogues lengthen, while a layered memory architecture improves performance.

What these jobs do in practice: deduplicate memories, update summaries/dossiers, extract new entities and relations, compress stale dialogue into durable form, and fix links/citations.
Source: GBrain and LangMem sources above; https://docs.letta.com/guides/agents/architectures/memgpt — Letta Docs — describes recursive compaction and moving old messages to recall memory.
Is there measurable impact? The evidence is still stronger for memory architectures in general than for “overnight batch consolidation” specifically. Mem0 has benchmarked memory pipelines; BEAM/LIGHT benchmarks layered memory; but public controlled ablations isolating “night jobs” are still sparse.
Source: https://mem0.ai/research — Mem0 Research — reports accuracy/latency/token improvements from selective memory architectures; https://openreview.net/forum?id=y59hf5lrMn — BEAM/LIGHT — reports gains from layered memory components.
Inference: batch consolidation is plausible and widely adopted, but not yet well benchmarked as an isolated variable.

If you want the lowest setup cost, use long context + conversation summaries + a file-native vault first.
Source: https://ai.google.dev/gemini-api/docs/long-context — Google Docs — shows when long context can replace heavier memory plumbing; https://simonwillison.net/2025/Apr/7/long-context-llm/ — Simon Willison — notes long context is increasingly usable and inexpensive for some tasks.
If you want the best long-horizon recall with moderate complexity, use hybrid retrieval plus entity pages plus nightly consolidation. That is the current sweet spot.
Source: GBrain, Mem0 graph memory, Jason Liu lexical search, and LangMem background manager above.
Inference: this is the best tradeoff judgment from current public systems and benchmarks.
If you want maximum rigor and auditability, prefer markdown/files or append-only logs as the database of record, and treat vector DBs and graphs as derived indexes.
Source: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f — Andrej Karpathy — raw sources are immutable source of truth, wiki is compiled; https://github.com/garrytan/gbrain — Garry Tan — markdown repo in git is source of truth.

Vector DBs vs keyword+reranking: experts increasingly reject vector-only pipelines. Jason Liu’s retrieval work is representative of the shift toward lexical+semantic+rereank stacks.
Source: https://jxnl.co/writing/2025/09/11/lexical-search-in-rag-applications/ — Jason Liu — argues for combining lexical and semantic retrieval, not choosing one.
Do you need a graph? Public implementations say “sometimes.” Mem0 and Anthropic show real value for relation-heavy recall, but many personal-agent setups work fine with entity pages plus hybrid search.
Source: https://docs.mem0.ai/open-source/graph_memory/overview — Mem0 Docs — graph helps when actors/objects and relations matter; https://github.com/modelcontextprotocol/servers/blob/main/src/memory/README.md — MCP Memory README — graph memory is simple and useful; https://github.com/garrytan/gbrain — Garry Tan — gets a lot of graph-like value from markdown dossiers plus search.
Markdown-as-truth vs database-of-record: Karpathy and Garry Tan are firmly on the markdown side because it is inspectable and editable; hosted-memory systems prefer databases for operational simplicity.
Source: Karpathy gist and GBrain repo above.
Contrarian view: database-native systems are cleaner for multi-tenant apps and policy enforcement, but they sacrifice transparency unless they emit human-readable compiled views.
How much autonomy should memory have? Letta/MemGPT and A-MEM lean toward agent-managed memory edits; more conservative builders prefer explicit extraction pipelines and reviewable compiled pages.
Source: Letta memory tools docs; A-MEM paper; Karpathy/GBrain patterns.
Contrarian view: more autonomous memory editing improves adaptation but raises silent-corruption risk.

Bottom line For a single-user personal agent, I would build:

Markdown or append-only files as truth
Postgres + pgvector + Postgres FTS for hybrid retrieval
Entity pages for people, companies, projects, and concepts
Optional graph edges only for relation-heavy use cases
Nightly consolidation jobs to update dossiers and summaries
Long-context mode for bounded research/code tasks, not as the only memory layer

That stack gives the best balance today across setup cost, recall quality, inspectability, maintenance, and portability.

Sources by type

If you want, I can turn this into a decision matrix spreadsheet or a concrete reference architecture for your own personal-agent stack.