⌂ Home ☷ Board

Postgres pgvector vs Qdrant for a 5M-Vector RAG

Date: 2026-05-31 Type: Research Status: At 5M vectors, pgvector wins on ops/cost if Postgres already in stack; Qdrant wins on filtered-query latency, quantization, and headroom past 10M. Sources: pgvector-vs-qdrant-5m-vector-rag-2026-05-31.sources.json

TL;DR

5M vectors sit exactly on the documented inflection point. Both systems handle it, but they fail in different directions:

Pick by filter density and growth trajectory, not raw vector count.

Head-to-Head (5M scale, 768-1536 dim, HNSW)

Axis pgvector 0.9 Qdrant
p50 latency (unfiltered, 1M) 3-8ms 2-5ms
p99 latency (unfiltered, 1M) 15-25ms 10-15ms
p50 filtered (500K + metadata) ~25ms (heap-scan breaks index locality) ~6ms (payload index)
Throughput (1024d, single box) 5-15K QPS 3,200+ QPS CPU-only, scales horizontally
Memory HNSW must fit RAM; no native quantization Scalar + binary quantization, optional rerank; ~65% memory reduction
HNSW build time @ 5M 25+ min comparable, parallelized
Hybrid (BM25 + dense) Native via tsvector + RRF in SQL Built-in sparse vectors + fusion
Filtering model SQL WHERE — index/heap interaction expensive First-class payload index, pre-filter or post-filter
Transactions with app data ACID, same tx Separate service, dual-write
Backups/replication/RLS Inherits Postgres Manual (snapshots, distributed mode)
Managed cost @ ~5M, 1K QPS $180-500/mo (RDS db.r6g.large, Supabase, Neon) $100-300/mo (Qdrant Cloud), $30-50 self-hosted small
Operational surface One service New service: Docker/k8s cluster, custom metrics, sharding
License PostgreSQL Apache 2.0

Numbers compiled from Tiger Data, Markaicode, CallSphere, and Nirant Kasliwal 1M-OpenAI benchmarks (see Sources).

Where pgvector wins

  1. Postgres already in stack. Adding vector search = CREATE EXTENSION vector; + one migration. No new service, no dual-write, no extra on-call rotation.
  2. Transactional consistency. Document + embedding in same tx → zero sync bugs, no eventual-consistency window between row and vector.
  3. Joins + SQL filters that touch business tables. WHERE tenant_id = ? AND created_at > ? ORDER BY embedding <-> $1 against real foreign keys — Qdrant payloads can't join.
  4. Cost floor. Marginal compute if existing Postgres has headroom. Tiger Data + pgvectorscale push the ceiling to 50M+ vectors at 471 QPS @ 99% recall.
  5. Backups, PITR, RLS, replication — inherited free.

Where Qdrant wins

  1. Filtered queries. pgvector's HNSW + heap-scan combo degrades sharply under selective metadata filters; Qdrant's payload index keeps filtered p50 ≈ unfiltered p50. Critical for multi-tenant RAG.
  2. Memory pressure. Scalar + binary quantization with reranking gives ~65% memory cut. Lets 5M × 1536d fit on a much smaller box. pgvector has nothing equivalent in core (halfvec helps, not the same).
  3. Tail latency. p95 under 5ms at 1M is hard to match in pgvector under concurrency.
  4. Growth past 10M. Distributed mode + sharding is built in. pgvector scales vertically; pgvectorscale extends the runway but adds a non-trivial extension.
  5. Pure vector throughput per dollar. Rust + SIMD; 3-5× QPS/dollar at scale per several 2026 comparisons.
  6. GPU acceleration (new in 2026 Qdrant) for index build and search.

At exactly 5M vectors — decision tree

Postgres already in prod?
├─ NO  → Qdrant (don't bring Postgres just for vectors)
└─ YES → Heavy metadata filtering (multi-tenant, ACL, time windows)?
         ├─ YES → Qdrant (filter performance is the deciding factor)
         └─ NO  → Will dataset 3-5× in 12 months?
                  ├─ YES → Qdrant (avoid migration later)
                  └─ NO  → pgvector + HNSW + halfvec (cheapest, simplest)

Operational realities people forget

Cost sketch (5M × 1024d, ~1K QPS, p99 < 100ms)

Engineer time to run a second stateful service typically dwarfs the infra delta. Factor that in.

Counterpoints

Recommendation for a 5M-vector RAG in 2026

Sources