Date: 2026-05-03 Tier: Deep (Tier-D, deep-research skill) Sources: 5 parallel providers + Reddit social signal. Codex/Gemini/GLM/Grok branches dropped (empty/timeout); Claude WebSearch carried primary load.
Router → Retriever → Document Grader → (Query Rewrite ↺ | Generate) → Hallucination Check → Answer + Citations
Layered prod arch: Orchestration (LangGraph state) · Planner (task decomposition) · Retriever (hybrid: BM25 + vector + RRF fusion) · Context Fusion (MMR dedup) · Tool Agent (SQL, code, APIs) · Reflection.
| Pattern | Mechanism | When to use |
|---|---|---|
| CRAG | Grader scores docs; <threshold → rewrite or web fallback | Cheap reliability layer, every prod system should have this |
| Self-RAG | Reflection tokens (IsREL, IsSUP, IsUSE) before commit | Regulated industries (legal/medical/finance) — 5.8% hallucination |
| GraphRAG | Entity-relation graph + community detection | Multi-hop reasoning, "themes across corpus" queries |
| HyDE | LLM generates hypothetical answer → use as retrieval query | Vague/empty user queries |
| Adaptive RAG | Router decides per-query: no-retrieval / single-shot / iterative | Mixed-difficulty workloads, cost optimization |
| Parent-Child chunking | Retrieve small, return large parent context | Lighter-weight alt to GraphRAG when graph too slow |
| Framework | Best for | Notes |
|---|---|---|
| LangGraph | Stateful workflows, HITL, checkpointing | Default orchestrator 2026. Pair with LangSmith |
| LlamaIndex AgentWorkflow | Retrieval-heavy pipelines | Stronger ingestion/parsing primitives (LlamaParse) |
| DSPy | Programmatic prompt optimization | Steepest learning curve; ML-systems-thinking required. Wins when retrieval+rerank+grader+generation interactions can't be hand-tuned |
| Cognee | Memory-first graph pipelines | 30+ source connectors, founder claims pure-RAG fails 40% of time → graph-memory needed. Gaps: TS SDK incomplete, TB-scale unproven |
| Haystack | Mature pipeline DAG | Less hot in 2026 vs LangGraph |
Composed-stack winner per multiple 2026 sources: LlamaIndex (retrieval) + LangGraph (orchestration) + RAGAS or LangSmith (eval).
| Mem0 | Zep | Letta | |
|---|---|---|---|
| Model | Fact extraction, three-tier scopes | Temporal knowledge graph | Self-editing memory blocks (MemGPT lineage) |
| Strength | Personalization, fastest to prod | Fact evolution over time | Long-running autonomous agents |
| Benchmark | LOCOMO winner (vendor) | LongMemEval 63.8% vs Mem0 49.0% | n/a — different category |
| Footprint | 1.7k tokens/conv (Mem0 paper) | 600k tokens/conv (Mem0 paper, contested) | OS-paging overhead |
| Funding/maturity | $24M Series A Oct 2025, v1.0 | Cloud-only advanced features | Open + commercial |
| Pitfall | Graph requires $249/mo Pro | Post-ingestion latency (background graph processing) | Loop overhead expensive on simple tasks |
Decision rule: chatbot personalization → Mem0 · evolving enterprise state → Zep · autonomous multi-day agents → Letta · deep KB retrieval → Cognee.
Lock-in warning: LangMem and LlamaIndex Memory tied to their frameworks — pick standalone (Mem0/Zep/Letta/Cognee/SuperMemory) if you might switch.
Stack: tracing platform + reference-free metrics library.
| Tool | Role | Notes |
|---|---|---|
| Langfuse | OSS tracing + evals + prompt mgmt | @observe() decorator, native RAGAS integration |
| Arize Phoenix | OSS observability, self-host | Strong trace UI, requires manual eval workflow setup |
| LangSmith | LangChain-native tracing | Default if you're already on LangGraph |
| RAGAS | Reference-free metrics (faithfulness, context precision/recall, answer relevancy) | 400k monthly downloads, 20M+ evals run |
| TruLens | RAG metrics + OTel tracing | Span-level diagnosis |
| DeepEval | Pytest-style regression suite | Add for CI/CD gating |
Pattern: score every trace if budget allows; otherwise sample N% nightly batch.
Per-node metadata to log: critic_score, retrieval_round, iteration_count, token_budget_used. Aggregate to spot which query types need 3+ rounds.
| Company | Pattern | Result |
|---|---|---|
| Fisher & Paykel (via Salesforce Agentforce) | Agentic RAG over manuals + CRM + policy | 66% external query autoresolve, 84% internal |
| Swisscom (CALM framework) | Customer-service agent rebuild | 20-week prototype→prod, 2× automation, 50% cost cut |
| Morgan Stanley | Internal financial research agents | Production deployment confirmed, no metrics public |
| PwC | Tax/compliance automation | Agentic RAG patterns |
| ServiceNow | Multi-turn RAG for IT workflows | Native platform feature |
| Samsung | Acquired Oxford Semantic Tech | Building next-gen KGs for supply chain |
| Lettria (vendor benchmark) | Hybrid GraphRAG, 4 domains | 80% correct vs 50.83% vector RAG (vendor source — flag) |
Market signal: Google Cloud 2025 ROI Report — 52% of GenAI enterprises run agents in prod, 88% positive ROI. Roots Analysis projects RAG market $1.96B (2025) → $40.34B (2035).
Gap: no LinkedIn/Uber/DoorDash engineering-blog level numbers found in this pass.
| Attack | Mechanism | Severity |
|---|---|---|
| BadRAG (Xue et al. 2024) | Inject docs with crafted high cosine similarity to target queries | 0.04% poison → 98.2% attack success on GPT-4/Claude-3 |
| TrojanRAG | Backdoor via fine-tune or corpus, trigger phrase activates | Persists across sessions, evades content filters |
| PoisonedRAG (USENIX Sec 2025) | Add 5 docs to millions | 90% wrong answer rate on triggers |
| Indirect prompt injection | Malicious content in indexed doc | OWASP LLM01:2025 |
| Vec2Text | Reconstruct source text from embeddings | 92% exact-match on short inputs — embeddings ≠ encrypted |
| Index over-scope | Chatbot retrieves docs user shouldn't see | "Innocent vendor onboarding question returned contract pricing" |