Postgres pgvector vs Qdrant for a 5M-Vector RAG

Date: 2026-05-31 Type: Research Status: At 5M vectors, pgvector wins on ops/cost if Postgres already in stack; Qdrant wins on filtered-query latency, quantization, and headroom past 10M. Sources: pgvector-vs-qdrant-5m-vector-rag-2026-05-31.sources.json

TL;DR

5M vectors sit exactly on the documented inflection point. Both systems handle it, but they fail in different directions:

pgvector (HNSW, v0.9+) — sub-20ms p50 at ~95% recall on a single beefy Postgres box, transactional with your app data, ~$0 incremental if Postgres already exists.
Qdrant — ~2-5ms p50 / ~10-15ms p99 at 1M, Rust+SIMD, native scalar/binary quantization (~65% memory cut vs IVFFlat), filtered-query latency stays flat where pgvector's degrades 4-5×.

Pick by filter density and growth trajectory, not raw vector count.

Head-to-Head (5M scale, 768-1536 dim, HNSW)

Axis	pgvector 0.9	Qdrant
p50 latency (unfiltered, 1M)	3-8ms	2-5ms
p99 latency (unfiltered, 1M)	15-25ms	10-15ms
p50 filtered (500K + metadata)	~25ms (heap-scan breaks index locality)	~6ms (payload index)
Throughput (1024d, single box)	5-15K QPS	3,200+ QPS CPU-only, scales horizontally
Memory	HNSW must fit RAM; no native quantization	Scalar + binary quantization, optional rerank; ~65% memory reduction
HNSW build time @ 5M	25+ min	comparable, parallelized
Hybrid (BM25 + dense)	Native via tsvector + RRF in SQL	Built-in sparse vectors + fusion
Filtering model	SQL WHERE — index/heap interaction expensive	First-class payload index, pre-filter or post-filter
Transactions with app data	ACID, same tx	Separate service, dual-write
Backups/replication/RLS	Inherits Postgres	Manual (snapshots, distributed mode)
Managed cost @ ~5M, 1K QPS	$180-500/mo (RDS db.r6g.large, Supabase, Neon)	$100-300/mo (Qdrant Cloud), $30-50 self-hosted small
Operational surface	One service	New service: Docker/k8s cluster, custom metrics, sharding
License	PostgreSQL	Apache 2.0

Numbers compiled from Tiger Data, Markaicode, CallSphere, and Nirant Kasliwal 1M-OpenAI benchmarks (see Sources).

Where pgvector wins

Postgres already in stack. Adding vector search = CREATE EXTENSION vector; + one migration. No new service, no dual-write, no extra on-call rotation.
Transactional consistency. Document + embedding in same tx → zero sync bugs, no eventual-consistency window between row and vector.
Joins + SQL filters that touch business tables. WHERE tenant_id = ? AND created_at > ? ORDER BY embedding <-> $1 against real foreign keys — Qdrant payloads can't join.
Cost floor. Marginal compute if existing Postgres has headroom. Tiger Data + pgvectorscale push the ceiling to 50M+ vectors at 471 QPS @ 99% recall.
Backups, PITR, RLS, replication — inherited free.

Where Qdrant wins

Filtered queries. pgvector's HNSW + heap-scan combo degrades sharply under selective metadata filters; Qdrant's payload index keeps filtered p50 ≈ unfiltered p50. Critical for multi-tenant RAG.
Memory pressure. Scalar + binary quantization with reranking gives ~65% memory cut. Lets 5M × 1536d fit on a much smaller box. pgvector has nothing equivalent in core (halfvec helps, not the same).
Tail latency. p95 under 5ms at 1M is hard to match in pgvector under concurrency.
Growth past 10M. Distributed mode + sharding is built in. pgvector scales vertically; pgvectorscale extends the runway but adds a non-trivial extension.
Pure vector throughput per dollar. Rust + SIMD; 3-5× QPS/dollar at scale per several 2026 comparisons.
GPU acceleration (new in 2026 Qdrant) for index build and search.

At exactly 5M vectors — decision tree

Postgres already in prod?
├─ NO  → Qdrant (don't bring Postgres just for vectors)
└─ YES → Heavy metadata filtering (multi-tenant, ACL, time windows)?
         ├─ YES → Qdrant (filter performance is the deciding factor)
         └─ NO  → Will dataset 3-5× in 12 months?
                  ├─ YES → Qdrant (avoid migration later)
                  └─ NO  → pgvector + HNSW + halfvec (cheapest, simplest)

Operational realities people forget

HNSW must fit in RAM on pgvector for production latency. At 5M × 1536d float32 ≈ 30GB just for vectors before index overhead. Provision accordingly or use halfvec (float16) to halve it.
pgvector index build is single-threaded per index until very recent versions; 5M HNSW build can block other ops or run for ~25 min. Plan maintenance window or build CONCURRENTLY.
Qdrant snapshots ≠ Postgres PITR. If point-in-time recovery matters, that's a real operational gap.
Qdrant payload filters need explicit payload indexes to stay fast — easy to forget and then blame "Qdrant got slow".
Hybrid retrieval (BM25 + dense + rerank) is now native in both — pgvector via tsvector + RRF in SQL, Qdrant via sparse vectors + fusion. Neither is the deciding factor anymore.

Cost sketch (5M × 1024d, ~1K QPS, p99 < 100ms)

pgvector on RDS db.r6g.large + extra RAM — ~$180-300/mo dedicated, or $0 marginal if shared.
Supabase / Neon with pgvector — included in plan; watch egress and compute add-on.
Qdrant Cloud (managed) — ~$100-300/mo for this footprint.
Qdrant self-hosted (Hetzner / similar) — $30-80/mo for the box; engineer time is the real cost.

Engineer time to run a second stateful service typically dwarfs the infra delta. Factor that in.

Counterpoints

"They're the same speed." Aleksapolskyi's Medium benchmark on 5,500 docs found identical latency with both on HNSW — true at toy scale, irrelevant at 5M. Cited so you don't get blindsided in a meeting.
pgvectorscale changes the math. Tiger Data's pgvectorscale (StreamingDiskANN in Rust) holds 471 QPS @ 99% recall at 50M vectors — narrows Qdrant's "scale past 10M" advantage significantly. If you're Postgres-shop, evaluate it before defaulting to Qdrant.
"As little infra as necessary." The 2026 industry trend is extending the relational DB, not adding specialized stores. Default bias should lean pgvector unless a concrete failure mode forces Qdrant.
Filter benchmarks vary by query shape. "pgvector degrades to 25ms under filter" is true for selective filters that break HNSW locality; partial indexes or iterative scan (pgvector 0.8+) mitigate this. Re-benchmark on your actual filter distribution before deciding.

Recommendation for a 5M-vector RAG in 2026

Default: pgvector 0.9 + HNSW + halfvec on managed Postgres if Postgres is in stack and filters are simple. Cheapest, simplest, transactional.
Switch to Qdrant if any of these are true: heavy per-query metadata filtering, multi-tenant ACL filters, sub-10ms p99 SLO, RAM-constrained host, or projected growth to 20M+ within 12 months.
If pgvector starts hurting before you hit Qdrant, try pgvectorscale first — keeps the single-stack advantage.