← Reports
← All reports

NotebookLM Deep Research — Memory Systems for AI Agents

Generated: 2026-04-11 15:43 UTC

Notebook: bec1d105-c3c1-4f92-a2e3-faf94f15e570


Summary

(no summary)

Per-source reports

1. Cognitive Architectures for Persistent Agency: A Technical Analysis of Modern Memory Systems in Large Language Model Agents

Cognitive Architectures for Persistent Agency: A Technical Analysis of Modern Memory Systems in Large Language Model Agents

The evolution of artificial intelligence from stateless, task-oriented models to persistent, autonomous agents represents the most significant architectural shift in the post-transformer era. At the heart of this transition is the development of sophisticated memory systems that allow agents to learn, adapt, and maintain continuity across disparate sessions. While early large language model (LLM) applications relied on simple context window management, contemporary frameworks such as Letta, Mem0, and LangMem are implementing tiered cognitive architectures that mirror human memory structures. These systems address the inherent limitations of the "amnesiac" LLM by providing a dedicated stateful layer, often described through the lens of operating system metaphors where the LLM serves as the central processing unit and the context window functions as a limited, high-speed random-access memory (RAM).[1, 2, 3]

The Operating System Metaphor and Virtual Context Management

Andrej Karpathy has fundamentally reframed the industry's understanding of AI agents by characterizing the LLM as a "new kind of operating system." In this framework, the prompt is not merely a set of instructions but a mechanism for context engineering—the delicate art of scheduling the most relevant information into the model's immediate working memory at every "clock cycle".[2, 3, 4] This perspective shifts the focus from simple retrieval-augmented generation (RAG) to sophisticated lifecycle management. The context window, acting as RAM, is expensive and finite. Consequently, the memory manager within this "LLM OS" must implement scheduling policies to decide which data to load, which to evict, and which to prioritize to prevent context failures such as contamination, interference, and confusion.[2, 3]

The Letta platform (formerly MemGPT) was the first to formalize this OS-inspired architecture. By implementing a virtual memory system, Letta allows agents to operate as if they have an infinite context window. The architecture is built on three primary tiers: core memory, which is always pinned to the context; recall memory, which contains searchable conversation history; and archival memory, which serves as a long-term, read-only storage layer that the agent can query via tool calls.[5, 6, 7] The transition to the letta_v1_agent architecture has further refined this by leveraging native reasoning capabilities and deprecating earlier "heartbeat" mechanisms that forced the model to maintain an active control loop. This modern architecture uses a Responses API to handle encrypted reasoning across providers, ensuring that frontier models like GPT-5 can utilize their internal reasoning tokens without the overhead of manual control-flow prompting.[5]

Tier Analogy Mechanism Persistence Access Latency
Core Memory RAM System Prompt/Pinned Context Session-bound (but synced) Microseconds
Recall Memory Disk Cache Vector DB (Recent History) Cross-session 10ms - 50ms
Archival Memory Cold Storage External Data/Long-term DB Permanent 100ms - 1000ms

Garry Tan and the contributors to the GBrain project have extended this philosophy into a "compiled knowledge" paradigm. Karpathy describes this through a compiler analogy: raw information (articles, papers, notes) acts as source code, the LLM functions as the compiler, and the synthesized output—often a structured wiki—serves as the executable.[8, 9] This approach prioritizes a "markdown-as-truth" model, where human-readable files are the primary system of record, and databases are merely high-speed indices used for retrieval when standard search tools like grep become insufficient.[10]

Architectural Deep Dives: Letta, Mem0, and LangMem

The design patterns for these systems reveal a maturing engineering discipline focused on stateful persistence. Letta Code, specifically designed for development environments, introduces "Context Repositories." Unlike traditional database-backed memory, these repositories clone an agent's memory directly to a local git-backed filesystem.[11] This allows agents to use standard Unix primitives for memory management. An agent can run a bash script to batch-process its memory, split large files to avoid context bloating, and use git to maintain a versioned history of its learning process.[11]

Virtual Memory as Filesystem in Letta

The progressive disclosure of memory is managed through a hierarchical folder structure. The filetree itself is always present in the system prompt, acting as a navigational signal. Each file contains YAML frontmatter describing its contents, which allows the agent to programmatically move files into a special system/ directory to "pin" them to the active context.[11] This architectural pattern enables "Divergent Learning," where an agent can maintain multiple memory worktrees in parallel—experimenting with different learning strategies before merging the most successful results back into the main branch.[11]

The Universal Memory Layer and the A.U.D.N. Cycle

Mem0 adopts a different approach, positioning itself as a universal, self-improving memory layer. Its core innovation is a two-phase pipeline consisting of Extraction and Update.[12] When a new interaction occurs, the system does not simply store the log; it uses a MEMORY_DEDUCTION_PROMPT to distill facts, preferences, and entities into "memory candidates".[12] These candidates then undergo the A.U.D.N. (Add, Update, Delete, No-op) cycle. The LLM acts as a database operator, performing a semantic search for similar existing memories and deciding whether to add a new fact, update an existing one to resolve conflicts, delete outdated information, or ignore redundant data.[12]

Operation Trigger Condition Outcome
ADD New unique fact identified New entry in vector/graph store
UPDATE New info complements or refines old info Existing entry is modified/merged
DELETE New info contradicts previous info Outdated entry is removed
NO-OP Redundant or irrelevant info No change to the memory store

This logic is implemented through a pluggable provider pattern, allowing Mem0 to integrate with various vector stores (like Qdrant or ChromaDB) and graph databases (Mem0g variant).[12] The graph component is particularly critical for modeling complex relationships, enabling multi-hop reasoning that vector-only systems struggle to achieve—for instance, remembering that "Alice's colleague, Bob, prefers Python".[12, 13]

Functional Memory Taxonomies in LangMem

The LangMem SDK from LangChain introduces a functional taxonomy that distinguishes between semantic, procedural, and episodic memory.[14] Semantic memory stores stable user facts and knowledge triplets, often implemented with strict namespacing to ensure tenant isolation in multi-user environments.[15, 16] Procedural memory, perhaps the most innovative aspect of LangMem, focuses on "how" to perform tasks. It stores learned behaviors as updated instructions in the system prompt, refined through optimization algorithms like metaprompt or gradient updates.[14] Episodic memory captures specific past events and successful problem-solving trajectories, often provided to the model as distilled few-shot examples.[14]

Academic Foundations and the Evolution of Cognitive Simulation

The theoretical underpinnings of agentic memory are rooted in the "Generative Agents" research by Park et al., which introduced a "Memory Stream" to simulate believable human behavior.[17, 18] In this architecture, every experience is logged in natural language and retrieved based on a score derived from three factors: recency, importance, and relevance. The importance score is particularly noteworthy; the agent uses an LLM to rate the significance of an event on a scale of 1 to 10. A mundane observation like "eating breakfast" receives a low score, while "receiving a party invitation" receives a high score, increasing its probability of retrieval in future relevant contexts.[18]

The mathematical representation of this retrieval mechanism is: $$Score = \alpha \cdot S_{recency} + \beta \cdot S_{importance} + \gamma \cdot S_{relevance}$$where relevance is typically measured by the cosine similarity between the query embedding $Q$ and the memory embedding $M$:$$S_{relevance} = \frac{Q \cdot M}{|Q||M|}$$.[7, 18]

Subsequent research has introduced more dynamic organizational patterns. The MemoryBank system (Zhong et al.) incorporates a "forgetting mechanism" based on biologically motivated heuristics, preventing memory saturation by selectively pruning less salient facts over time.[7, 19] Conversely, the A-MEM (Agentic Memory) framework adopts the Zettelkasten method, where the agent autonomously generates "atomic notes" with structured attributes like tags and keywords.[20, 21] When a new memory is integrated, it triggers a "memory evolution" phase, where the system analyzes existing notes to establish new links and update its holistic understanding.[20, 21, 22]

The most recent innovation, Proactive Memory Extraction (ProMem), addresses the "ahead-of-time" limitation of traditional summarization. Rather than summarizing history before knowing the future task, ProMem implements a recurrent feedback loop.[23] The agent uses self-questioning to actively probe its dialogue history, recovering missing details and correcting "hallucinated" summaries. This "look-back" mechanism ensures that the final memory extraction is both complete and accurate for the specific query at hand.[23]

Retrieval Architecture Tradeoffs and Hybrid Systems

Modern agentic systems have largely moved away from pure vector search in favor of hybrid retrieval architectures. Pure vector search, while effective for semantic meaning, often fails on exact term matching—a common requirement in technical or legal domains.[24, 25]

Vector, Keyword, and Graph Retrieval

Keyword search (BM25) provides precise lexical matching, ensuring that specific IDs or names are correctly identified. Hybrid systems combine vector and keyword results using Reciprocal Rank Fusion (RRF): $$RRFscore(d) = \sum_{r \in R} \frac{1}{k + r(d)}$$ where $r(d)$ is the rank of document $d$ in result set $R$, and $k$ is a smoothing constant.[24]

Graph-based retrieval adds a layer of relational intelligence. By modeling entities as nodes and relationships as edges, agents can perform graph traversals to answer multi-hop queries that would otherwise be "invisible" to a similarity-based vector search.[13, 25] However, graph systems require significantly more complex upkeep and can incur 2-5x the storage overhead of flat vector databases.[13]

Architecture Storage Format Mechanism Precision Latency
Vector Store Dense Embeddings ANN (HNSW/IVF) 85% 10ms - 100ms
Keyword Index Inverted Index BM25 70% 5ms - 20ms
Knowledge Graph Adjacency Lists BFS/DFS Traversal 92% 50ms - 150ms
Hybrid Dual Indexing Score Fusion (RRF) 90%+ 100ms - 600ms

Glean and other enterprise platforms emphasize that the most robust architectures combine these signals with organizational metadata, such as permissions and authority rankings, to ensure that retrieved context is not only relevant but also governed and trustworthy.[25, 26]

Dream Cycles and Overnight Consolidation Patterns

A defining feature of advanced memory-first agents is the "Dream Cycle"—a background process inspired by human sleep that performs essential maintenance on the agent's knowledge base.[10] In the GBrain and OpenClaw ecosystems, this is implemented as DREAMS.md. During this cycle, the agent scans every conversation from the day to identify missing entities, fix broken citations, and consolidate redundant memories.[10]

Garry Tan notes that this creates a "compounding effect": an agent that enriches a person's profile after a meeting can automatically surface that context the next time the person is mentioned, even months later.[10] Karpathy's "LLM Knowledge Base" architecture utilizes a similar pattern of "active maintenance" or "linting," where the LLM scans the wiki for inconsistencies or new connections, effectively "healing" the knowledge base while the user is away.[8] Letta's "sleep-time compute" follows a similar logic, using a separate worktree to perform heavy reflection and reorganization without blocking the agent's operational thread during the workday.[11]

Contested Questions: RAG vs. Long-Context and Storage Models

The industry remains divided on the long-term viability of RAG in the face of expanding context windows. While models like Gemini 1.5 Pro support over a million tokens, practitioners like Chip Huyen and Andrej Karpathy argue that RAG remains essential for production reliability.[2, 26, 27, 28]

The RAG vs. Long-Context Dialectic

Long-context models excel at exhaustive analysis of a single document but hit significant limits in production: 1. Latency: Processing 1 million tokens can take 30-60 seconds, whereas RAG-based systems typically respond in under 2 seconds.[27, 29] 2. Cost: The token cost for loading entire knowledge bases into every prompt is prohibitive compared to the selective retrieval of RAG.[27] 3. Fidelity: Models often exhibit "lost in the middle" phenomena, where retrieval accuracy degrades for information not located at the beginning or end of a massive context.[2, 27, 28]

The consensus among experts like Chip Huyen is that production agents will continue to use both: RAG for accessing vast, static document corpora, and a dedicated memory layer (like Mem0) for tracking stateful user preferences and conversation history.[26, 30]

Markdown-as-Truth vs. Database-as-Truth

A second contest concerns the underlying storage format. The "markdown-as-truth" philosophy, as seen in GBrain and Karpathy's personal "vaults," argues that a personal knowledge base should be human-readable and system-agnostic.[8, 10, 31] This approach avoids the "black box" of vector embeddings, allowing a human to edit, delete, or verify information directly in a text editor.[8] Conversely, the "DB-as-truth" model, common in enterprise applications, prioritizes structured search, ACID compliance, and multi-tenant isolation, treating markdown only as an export format rather than the source of truth.[3, 10, 13]

The Model Context Protocol (MCP) and Obsidian Patterns

The Model Context Protocol (MCP) has emerged as the standard bridge between agents and local data.[31, 32, 33] Obsidian, with its local-first markdown approach, has become the preferred environment for implementing these patterns. The Obsidian MCP Server allows agents like Claude Code to read, search, and modify notes directly, effectively giving the AI a "structured, persistent brain".[31, 34, 35]

Two primary implementation patterns exist: 1. General Vault Access: Servers like cyanheads/obsidian-mcp-server provide comprehensive R/W/S access to an existing vault, using an in-memory cache to ensure sub-millisecond search performance across large note collections.[35] 2. Entity-Centric Memory Graphs: Projects like YuNaga224/obsidian-memory-mcp and MegaMem transform conversations into an explicit knowledge graph within Obsidian. Every fact is stored as a node with versioned properties and timestamps, allowing the user to visualize the agent's internal "thinking" as a network of interconnected ideas.[32, 36]

MCP Implementation Primary Goal Data Format Best For
General Server Vault Manipulation Standard Markdown Automating existing workflows
Memory Server Graph Construction Entity-centric (YAML + Links) Building a visual AI second brain
MegaMem Temporal Knowledge Temporal Graph (Graphiti) Multi-hop reasoning across time

Practical Tradeoffs for Personal Agent Deployment

For a single-user deployment, the architecture must balance privacy, latency, and cost. Local deployments using tools like Ollama or LM Studio offer the highest degree of data sovereignty, ensuring that sensitive information—from biometric data to private code—never leaves the user's hardware.[37, 38, 39]

Local vs. Cloud Tradeoffs

Cloud-based agents offer infinite scalability and access to "frontier" models like GPT-4 or Claude 3.5 Sonnet, which are far more capable than models that can run on consumer hardware.[38, 40] However, local agents eliminate network latency (measured in microseconds for file reads) and protect against the risks of cloud vendor lock-in or service outages.[37, 40]

A significant risk in local deployments is the "responsibility stack." The user becomes responsible for hardware maintenance, backups, and security, including protecting against process leaks in GPU memory (e.g., CVE-2023-4969).[38, 41] For many users, a hybrid approach is the most effective: using local storage (like Obsidian) as the persistent system of record, and cloud-based models for complex reasoning tasks, with a strict governance layer (like LangMem's namespacing) to manage data egress.[15, 37, 40]

Evaluation and Observability

Regardless of the deployment model, the reliability of a memory system depends on its evaluation framework. Hamel Husain emphasizes that "vibe checks" are insufficient for production agents.[42, 43] Systematic evaluation involves creating a failure taxonomy—categorizing errors as hallucinations, retrieval misses, or logic failures—and using "LLM-as-a-Judge" to calibrate performance against human labels.[43, 44] Jason Liu's Instructor library addresses the core of this problem by enforcing structured outputs, ensuring that even as memory grows more complex, the agent's interaction with that memory remains typed, validated, and predictable.[45, 46]

Towards Cognitive Autonomy

The convergence of virtual memory, hybrid retrieval, and autonomous consolidation suggests a future where AI agents are no longer just tools, but digital partners that evolve alongside their users. The development of "Universal Memory" standards and "Proactive Extraction" feedback loops marks the transition from passive retrieval to active knowledge management. As practitioners move toward deployments that prioritize local data sovereignty while leveraging cloud-based reasoning, the "LLM OS" becomes a reality—a system where the agent truly remembers, reflects, and reacts with the continuity and depth of a human collaborator.[2, 12, 23]


  1. Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time. - GitHub, https://github.com/letta-ai/letta
  2. Deep Dive into Context Engineering for Agents - Galileo AI, https://galileo.ai/blog/context-engineering-for-agents
  3. Is Your AI Agent Getting Dumber? Alibaba Cloud AnalyticDB Unveils AI Context Engineering, https://www.alibabacloud.com/blog/602803
  4. Engineering for AI Agents - Redis, https://redis.io/blog/engineering-for-ai-agents/
  5. Rearchitecting Letta's Agent Loop: Lessons from ReAct, MemGPT ..., https://www.letta.com/blog/letta-v1-agent
  6. Best AI Agent Memory Systems in 2026: 8 Frameworks Compared - Vectorize, https://vectorize.io/articles/best-ai-agent-memory-systems
  7. MemoryBank Architectures - Emergent Mind, https://www.emergentmind.com/topics/memorybank
  8. Karpathy shares 'LLM Knowledge Base' architecture that bypasses RAG with an evolving markdown library maintained by AI | VentureBeat, https://venturebeat.com/data/karpathy-shares-llm-knowledge-base-architecture-that-bypasses-rag-with-an
  9. What Is Andrej Karpathy's LLM Knowledge Base? The Compiler Analogy for AI Memory, https://www.mindstudio.ai/blog/karpathy-llm-knowledge-base-compiler-analogy
  10. garrytan/gbrain: Garry's Opinionated OpenClaw/Hermes ... - GitHub, https://github.com/garrytan/gbrain
  11. Introducing Context Repositories: Git-based Memory for Coding ..., https://www.letta.com/blog/context-repositories
  12. GitHub All-Stars #2: Mem0 - VirtusLab, https://virtuslab.com/blog/ai/git-hub-all-stars-2/
  13. AI Agent Memory Comparative Guide: RAG vs Vector Stores vs Graph-Based Approaches — March 15, 2025 - Sparkco, https://sparkco.ai/blog/ai-agent-memory-in-2026-comparing-rag-vector-stores-and-graph-based-approaches
  14. LangMem SDK for agent long-term memory - LangChain, https://www.langchain.com/blog/langmem-sdk-launch
  15. LangMem SDK for Agent Long-Term Memory - DigitalOcean, https://www.digitalocean.com/community/tutorials/langmem-sdk-agent-long-term-memory
  16. LangMem: Long-Term Memory for AI Agents | by Astropomeai | Medium, https://medium.com/@astropomeai/langmem-long-term-memory-for-ai-agents-366d7256ddce
  17. [PDF] Generative Agents: Interactive Simulacra of Human Behavior - Semantic Scholar, https://www.semanticscholar.org/paper/Generative-Agents%3A-Interactive-Simulacra-of-Human-Park-O%E2%80%99Brien/5278a8eb2ba2429d4029745caf4e661080073c81
  18. Generative Agents: Interactive Simulacra of Human Behavior - Abhinav Chinta, https://abhinavchinta.com/files/generative_agents_talk.pdf
  19. Memory OS of AI Agent - ACL Anthology, https://aclanthology.org/2025.emnlp-main.1318.pdf
  20. A-Mem: Agentic Memory for LLM Agents | OpenReview, https://openreview.net/forum?id=FiM0M8gcct
  21. A-MEM: Agentic Memory for LLM Agents - arXiv, https://arxiv.org/pdf/2502.12110
  22. NeurIPS Poster A-Mem: Agentic Memory for LLM Agents, https://neurips.cc/virtual/2025/poster/119020
  23. Beyond Static Summarization: Proactive Memory Extraction for LLM Agents - arXiv, https://arxiv.org/html/2601.04463v1
  24. Hybrid search benefits: Why RAG systems need both methods - Redis, https://redis.io/blog/hybrid-search-benefits-rag-systems/
  25. Knowledge graph vs vector database: how to choose your AI foundation - Glean, https://www.glean.com/blog/knowledge-graph-vs-vector-database
  26. AI Memory System vs RAG: Key Differences and Use Cases 2026 - Atlan, https://atlan.com/know/ai-memory-system-vs-rag/
  27. RAG vs Large Context Window: Real Trade-offs for AI Apps - Redis, https://redis.io/blog/rag-vs-large-context-window-ai-apps/
  28. Notes on 'AI Engineering' (Chip Huyen) chapter 6 - Alex Strick van Linschoten, https://alexstrick.com/posts/2025-01-24-notes-on-ai-engineering-chip-huyen-chapter-6.html
  29. RAG vs Long-Context LLMs: A Comprehensive Comparison | by Rost Glukhov | Medium, https://medium.com/@rosgluk/rag-vs-long-context-llms-a-comprehensive-comparison-9b30594c445e
  30. RAG vs. Memory: What AI Agent Developers Need to Know - Mem0, https://mem0.ai/blog/rag-vs-ai-memory
  31. Obsidian AI Second Brain: Complete Guide to Building Your AI-Powered Knowledge System (2026) | NxCode, https://www.nxcode.io/resources/news/obsidian-ai-second-brain-complete-guide-2026
  32. Unlocking Your AI's Brain: A Deep Dive into the Obsidian Memory MCP Server - Skywork, https://skywork.ai/skypage/en/ai-obsidian-memory-server/1978331309583015936
  33. modelcontextprotocol/servers: Model Context Protocol Servers - GitHub, https://github.com/modelcontextprotocol/servers
  34. Obsidian Memory System | Claude Code Skill for AI Persistence - MCP Market, https://mcpmarket.com/tools/skills/obsidian-memory-system
  35. cyanheads/obsidian-mcp-server - GitHub, https://github.com/cyanheads/obsidian-mcp-server
  36. C-Bjorn/MegaMem: Transform your Obsidian vault into a powerful knowledge graph with MCP support - GitHub, https://github.com/C-Bjorn/MegaMem
  37. Local vs Cloud Storage for AI Agents: A Complete Comparison - Fast.io, https://fast.io/resources/local-vs-cloud-agent-storage/
  38. Local AI vs Cloud AI: When Does Each Make Sense | by Jakub Jirak - Medium, https://jakubjirak.medium.com/local-ai-vs-cloud-ai-when-does-each-make-sense-2b374f9f5e48
  39. The Coming Breakup Between AI And The Cloud - Semiconductor Engineering, https://semiengineering.com/the-coming-breakup-between-ai-and-the-cloud/
  40. How to Evaluate Any New AI Agent Product Using Three Key Axes | MindStudio, https://www.mindstudio.ai/blog/how-to-evaluate-ai-agent-products-three-axes
  41. What Are the Key Risks of Deploying Local AI Agents in Business Environments? | Prelude, https://www.preludesecurity.com/blog/key-risks-of-deploying-local-agents
  42. AI Evals For Engineers & PMs by Hamel Husain and Shreya Shankar on Maven, https://maven.com/parlance-labs/evals
  43. LLM Evals: Everything You Need to Know – Hamel's Blog, https://hamel.dev/blog/posts/evals-faq/
  44. Evals Skills for Coding Agents – Hamel's Blog, https://hamel.dev/blog/posts/evals-skills/
  45. 567-labs/instructor: structured outputs for llms - GitHub, https://github.com/567-labs/instructor
  46. Instructor - Multi-Language Library for Structured LLM Outputs | Python, TypeScript, Go, Ruby - Instructor, https://python.useinstructor.com/

2. garrytan/gbrain: Garry's Opinionated OpenClaw/Hermes ... - GitHub

Source URL: https://github.com/garrytan/gbrain

3. Rearchitecting Letta's Agent Loop: Lessons from ReAct, MemGPT ...

Source URL: https://www.letta.com/blog/letta-v1-agent

4. Introducing Context Repositories: Git-based Memory for Coding ...

Source URL: https://www.letta.com/blog/context-repositories

5. GitHub All-Stars #2: Mem0 - VirtusLab

Source URL: https://virtuslab.com/blog/ai/git-hub-all-stars-2/

6. LangMem SDK for agent long-term memory - LangChain

Source URL: https://www.langchain.com/blog/langmem-sdk-launch

7. Best AI Agent Memory Systems in 2026: 8 Frameworks Compared - Vectorize

Source URL: https://vectorize.io/articles/best-ai-agent-memory-systems

8. Karpathy shares 'LLM Knowledge Base' architecture that bypasses RAG with an evolving markdown library maintained by AI | VentureBeat

Source URL: https://venturebeat.com/data/karpathy-shares-llm-knowledge-base-architecture-that-bypasses-rag-with-an

9. C-Bjorn/MegaMem: Transform your Obsidian vault into a powerful knowledge graph with MCP support - GitHub

Source URL: https://github.com/C-Bjorn/MegaMem

10. A-MEM: Agentic Memory for LLM Agents - arXiv

Source URL: https://arxiv.org/pdf/2502.12110

11. [PDF] Generative Agents: Interactive Simulacra of Human Behavior - Semantic Scholar

Source URL: https://www.semanticscholar.org/paper/Generative-Agents%3A-Interactive-Simulacra-of-Human-Park-O%E2%80%99Brien/5278a8eb2ba2429d4029745caf4e661080073c81

12. RAG vs. Memory: What AI Agent Developers Need to Know - Mem0

Source URL: https://mem0.ai/blog/rag-vs-ai-memory

13. RAG vs Large Context Window: Real Trade-offs for AI Apps - Redis

Source URL: https://redis.io/blog/rag-vs-large-context-window-ai-apps/

14. MemoryBank Architectures - Emergent Mind

Source URL: https://www.emergentmind.com/topics/memorybank

15. Beyond Static Summarization: Proactive Memory Extraction for LLM Agents - arXiv

Source URL: https://arxiv.org/html/2601.04463v1

16. Memory OS of AI Agent - ACL Anthology

Source URL: https://aclanthology.org/2025.emnlp-main.1318.pdf

17. Evals Skills for Coding Agents – Hamel's Blog

Source URL: https://hamel.dev/blog/posts/evals-skills/

18. Deep Dive into Context Engineering for Agents - Galileo AI

Source URL: https://galileo.ai/blog/context-engineering-for-agents

19. Hybrid search benefits: Why RAG systems need both methods - Redis

Source URL: https://redis.io/blog/hybrid-search-benefits-rag-systems/

20. Unlocking Your AI's Brain: A Deep Dive into the Obsidian Memory MCP Server - Skywork

Source URL: https://skywork.ai/skypage/en/ai-obsidian-memory-server/1978331309583015936

21. cyanheads/obsidian-mcp-server - GitHub

Source URL: https://github.com/cyanheads/obsidian-mcp-server

22. Obsidian Memory System | Claude Code Skill for AI Persistence - MCP Market

Source URL: https://mcpmarket.com/tools/skills/obsidian-memory-system

23. LangMem SDK for Agent Long-Term Memory - DigitalOcean

Source URL: https://www.digitalocean.com/community/tutorials/langmem-sdk-agent-long-term-memory

24. LangMem: Long-Term Memory for AI Agents | by Astropomeai | Medium

Source URL: https://medium.com/@astropomeai/langmem-long-term-memory-for-ai-agents-366d7256ddce

25. Obsidian AI Second Brain: Complete Guide to Building Your AI-Powered Knowledge System (2026) | NxCode

Source URL: https://www.nxcode.io/resources/news/obsidian-ai-second-brain-complete-guide-2026

26. Instructor - Multi-Language Library for Structured LLM Outputs | Python, TypeScript, Go, Ruby - Instructor

Source URL: https://python.useinstructor.com/

27. 567-labs/instructor: structured outputs for llms - GitHub

Source URL: https://github.com/567-labs/instructor

28. Notes on 'AI Engineering' (Chip Huyen) chapter 6 - Alex Strick van Linschoten

Source URL: https://alexstrick.com/posts/2025-01-24-notes-on-ai-engineering-chip-huyen-chapter-6.html

29. What Is Andrej Karpathy's LLM Knowledge Base? The Compiler Analogy for AI Memory

Source URL: https://www.mindstudio.ai/blog/karpathy-llm-knowledge-base-compiler-analogy

30. AI Agent Memory Comparative Guide: RAG vs Vector Stores vs Graph-Based Approaches — March 15, 2025 - Sparkco

Source URL: https://sparkco.ai/blog/ai-agent-memory-in-2026-comparing-rag-vector-stores-and-graph-based-approaches

31. Knowledge graph vs vector database: how to choose your AI foundation - Glean

Source URL: https://www.glean.com/blog/knowledge-graph-vs-vector-database

32. Local AI vs Cloud AI: When Does Each Make Sense | by Jakub Jirak - Medium

Source URL: https://jakubjirak.medium.com/local-ai-vs-cloud-ai-when-does-each-make-sense-2b374f9f5e48

33. Local vs Cloud Storage for AI Agents: A Complete Comparison - Fast.io

Source URL: https://fast.io/resources/local-vs-cloud-agent-storage/

34. Engineering for AI Agents - Redis

Source URL: https://redis.io/blog/engineering-for-ai-agents/

35. RAG vs Long-Context LLMs: A Comprehensive Comparison | by Rost Glukhov | Medium

Source URL: https://medium.com/@rosgluk/rag-vs-long-context-llms-a-comprehensive-comparison-9b30594c445e

36. Is Your AI Agent Getting Dumber? Alibaba Cloud AnalyticDB Unveils AI Context Engineering

Source URL: https://www.alibabacloud.com/blog/602803

37. Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time. - GitHub

Source URL: https://github.com/letta-ai/letta

38. A-Mem: Agentic Memory for LLM Agents | OpenReview

Source URL: https://openreview.net/forum?id=FiM0M8gcct

39. NeurIPS Poster A-Mem: Agentic Memory for LLM Agents

Source URL: https://neurips.cc/virtual/2025/poster/119020

40. Generative Agents: Interactive Simulacra of Human Behavior - Abhinav Chinta

Source URL: https://abhinavchinta.com/files/generative_agents_talk.pdf

41. AI Evals For Engineers & PMs by Hamel Husain and Shreya Shankar on Maven

Source URL: https://maven.com/parlance-labs/evals

42. LLM Evals: Everything You Need to Know – Hamel's Blog

Source URL: https://hamel.dev/blog/posts/evals-faq/

43. AI Memory System vs RAG: Key Differences and Use Cases 2026 - Atlan

Source URL: https://atlan.com/know/ai-memory-system-vs-rag/

44. modelcontextprotocol/servers: Model Context Protocol Servers - GitHub

Source URL: https://github.com/modelcontextprotocol/servers

45. The Coming Breakup Between AI And The Cloud - Semiconductor Engineering

Source URL: https://semiengineering.com/the-coming-breakup-between-ai-and-the-cloud/

46. How to Evaluate Any New AI Agent Product Using Three Key Axes | MindStudio

Source URL: https://www.mindstudio.ai/blog/how-to-evaluate-ai-agent-products-three-axes

47. What Are the Key Risks of Deploying Local AI Agents in Business Environments? | Prelude

Source URL: https://www.preludesecurity.com/blog/key-risks-of-deploying-local-agents

48. Letta Code: A Memory-First Coding Agent

Source URL: https://www.letta.com/blog/letta-code

49. letta-ai/letta-code: The memory-first coding agent - GitHub

Source URL: https://github.com/letta-ai/letta-code

50. Mem0 - The Memory Layer for your AI Apps

Source URL: https://mem0.ai/

51. GitHub - mem0ai/mem0: Universal memory layer for AI Agents

Source URL: https://github.com/mem0ai/mem0

52. Mem0 - GitHub

Source URL: https://github.com/mem0ai

53. [EMNLP 2025 Oral] MemoryOS is designed to provide a memory operating system for personalized AI agents. - GitHub

Source URL: https://github.com/BAI-LAB/MemoryOS

54. langchain-ai/langmem - GitHub

Source URL: https://github.com/langchain-ai/langmem

55. API Reference - LangMem

Source URL: https://langchain-ai.github.io/langmem/reference/

56. Memory & Task Systems: Giving Your AI Agent a Brain - Graham Mann

Source URL: https://grahammann.net/blog/memory-and-task-systems-giving-your-ai-agent-a-brain

57. What Is AI Agent Memory? | IBM

Source URL: https://www.ibm.com/think/topics/ai-agent-memory

58. PREreview of “Generative Agents: Interactive Simulacra of Human Behavior”

Source URL: https://prereview.org/reviews/17993733

59. Generative Agents: Interactive Simulacra of Human Behavior - Summary - Portkey

Source URL: https://portkey.ai/blog/generative-agents-interactive-simulacra-of-human-behavior-summary/

60. Jason Liu: Introductions

Source URL: https://jxnl.co/

61. Instructor - Structure LLM Outputs with Ease

Source URL: https://useinstructor.com/

62. Frequently Asked Questions - Jason Liu

Source URL: https://jxnl.co/writing/2025/08/10/frequently-asked-questions/

63. How to Master AI Evals: A Step-by-Step Guide with Hamel Husain & Shreya Shankar

Source URL: https://www.aakashg.com/ai-evals-masterclass-with-hamel-shreya/

64. LLM Agent Memory: A Survey from a Unified Representation–Management Perspective

Source URL: https://www.preprints.org/manuscript/202603.0359/v1

65. HiMeS: Hippocampus-inspired Memory System for Personalized AI Assistants - arXiv.org

Source URL: https://arxiv.org/html/2601.06152v1

66. Shichun-Liu/Agent-Memory-Paper-List: The paper list of "Memory in the Age of AI Agents: A Survey" - GitHub

Source URL: https://github.com/Shichun-Liu/Agent-Memory-Paper-List

67. Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading | alphaXiv

Source URL: https://www.alphaxiv.org/overview/2310.05029

68. Best AI Agent Memory Frameworks 2026: Mem0, Zep, LangChain, Letta - Atlan

Source URL: https://atlan.com/know/best-ai-agent-memory-frameworks-2026/

69. Feature Request: Obsidian/External Vault Integration for Agent Context and Memory · Issue #1506 · AndyMik90/Aperant - GitHub

Source URL: https://github.com/AndyMik90/Aperant/issues/1506