Date: 2026-05-31
Type: Research
Status: At 5M vectors, pgvector wins on ops/cost if Postgres already in stack; Qdrant wins on filtered-query latency, quantization, and headroom past 10M.
Sources: pgvector-vs-qdrant-5m-vector-rag-2026-05-31.sources.json
TL;DR
5M vectors sit exactly on the documented inflection point. Both systems handle it, but they fail in different directions:
- pgvector (HNSW, v0.9+) — sub-20ms p50 at ~95% recall on a single beefy Postgres box, transactional with your app data, ~$0 incremental if Postgres already exists.
- Qdrant — ~2-5ms p50 / ~10-15ms p99 at 1M, Rust+SIMD, native scalar/binary quantization (~65% memory cut vs IVFFlat), filtered-query latency stays flat where pgvector's degrades 4-5×.
Pick by filter density and growth trajectory, not raw vector count.
Head-to-Head (5M scale, 768-1536 dim, HNSW)
| Axis |
pgvector 0.9 |
Qdrant |
| p50 latency (unfiltered, 1M) |
3-8ms |
2-5ms |
| p99 latency (unfiltered, 1M) |
15-25ms |
10-15ms |
| p50 filtered (500K + metadata) |
~25ms (heap-scan breaks index locality) |
~6ms (payload index) |
| Throughput (1024d, single box) |
5-15K QPS |
3,200+ QPS CPU-only, scales horizontally |
| Memory |
HNSW must fit RAM; no native quantization |
Scalar + binary quantization, optional rerank; ~65% memory reduction |
| HNSW build time @ 5M |
25+ min |
comparable, parallelized |
| Hybrid (BM25 + dense) |
Native via tsvector + RRF in SQL |
Built-in sparse vectors + fusion |
| Filtering model |
SQL WHERE — index/heap interaction expensive |
First-class payload index, pre-filter or post-filter |
| Transactions with app data |
ACID, same tx |
Separate service, dual-write |
| Backups/replication/RLS |
Inherits Postgres |
Manual (snapshots, distributed mode) |
| Managed cost @ ~5M, 1K QPS |
$180-500/mo (RDS db.r6g.large, Supabase, Neon) |
$100-300/mo (Qdrant Cloud), $30-50 self-hosted small |
| Operational surface |
One service |
New service: Docker/k8s cluster, custom metrics, sharding |
| License |
PostgreSQL |
Apache 2.0 |
Numbers compiled from Tiger Data, Markaicode, CallSphere, and Nirant Kasliwal 1M-OpenAI benchmarks (see Sources).
Where pgvector wins
- Postgres already in stack. Adding vector search =
CREATE EXTENSION vector; + one migration. No new service, no dual-write, no extra on-call rotation.
- Transactional consistency. Document + embedding in same tx → zero sync bugs, no eventual-consistency window between row and vector.
- Joins + SQL filters that touch business tables.
WHERE tenant_id = ? AND created_at > ? ORDER BY embedding <-> $1 against real foreign keys — Qdrant payloads can't join.
- Cost floor. Marginal compute if existing Postgres has headroom. Tiger Data + pgvectorscale push the ceiling to 50M+ vectors at 471 QPS @ 99% recall.
- Backups, PITR, RLS, replication — inherited free.
Where Qdrant wins
- Filtered queries. pgvector's HNSW + heap-scan combo degrades sharply under selective metadata filters; Qdrant's payload index keeps filtered p50 ≈ unfiltered p50. Critical for multi-tenant RAG.
- Memory pressure. Scalar + binary quantization with reranking gives ~65% memory cut. Lets 5M × 1536d fit on a much smaller box. pgvector has nothing equivalent in core (halfvec helps, not the same).
- Tail latency. p95 under 5ms at 1M is hard to match in pgvector under concurrency.
- Growth past 10M. Distributed mode + sharding is built in. pgvector scales vertically; pgvectorscale extends the runway but adds a non-trivial extension.
- Pure vector throughput per dollar. Rust + SIMD; 3-5× QPS/dollar at scale per several 2026 comparisons.
- GPU acceleration (new in 2026 Qdrant) for index build and search.
At exactly 5M vectors — decision tree
Postgres already in prod?
├─ NO → Qdrant (don't bring Postgres just for vectors)
└─ YES → Heavy metadata filtering (multi-tenant, ACL, time windows)?
├─ YES → Qdrant (filter performance is the deciding factor)
└─ NO → Will dataset 3-5× in 12 months?
├─ YES → Qdrant (avoid migration later)
└─ NO → pgvector + HNSW + halfvec (cheapest, simplest)
Operational realities people forget
- HNSW must fit in RAM on pgvector for production latency. At 5M × 1536d float32 ≈ 30GB just for vectors before index overhead. Provision accordingly or use
halfvec (float16) to halve it.
- pgvector index build is single-threaded per index until very recent versions; 5M HNSW build can block other ops or run for ~25 min. Plan maintenance window or build CONCURRENTLY.
- Qdrant snapshots ≠ Postgres PITR. If point-in-time recovery matters, that's a real operational gap.
- Qdrant payload filters need explicit payload indexes to stay fast — easy to forget and then blame "Qdrant got slow".
- Hybrid retrieval (BM25 + dense + rerank) is now native in both — pgvector via
tsvector + RRF in SQL, Qdrant via sparse vectors + fusion. Neither is the deciding factor anymore.
Cost sketch (5M × 1024d, ~1K QPS, p99 < 100ms)
- pgvector on RDS db.r6g.large + extra RAM — ~$180-300/mo dedicated, or $0 marginal if shared.
- Supabase / Neon with pgvector — included in plan; watch egress and compute add-on.
- Qdrant Cloud (managed) — ~$100-300/mo for this footprint.
- Qdrant self-hosted (Hetzner / similar) — $30-80/mo for the box; engineer time is the real cost.
Engineer time to run a second stateful service typically dwarfs the infra delta. Factor that in.
Counterpoints
- "They're the same speed." Aleksapolskyi's Medium benchmark on 5,500 docs found identical latency with both on HNSW — true at toy scale, irrelevant at 5M. Cited so you don't get blindsided in a meeting.
- pgvectorscale changes the math. Tiger Data's pgvectorscale (StreamingDiskANN in Rust) holds 471 QPS @ 99% recall at 50M vectors — narrows Qdrant's "scale past 10M" advantage significantly. If you're Postgres-shop, evaluate it before defaulting to Qdrant.
- "As little infra as necessary." The 2026 industry trend is extending the relational DB, not adding specialized stores. Default bias should lean pgvector unless a concrete failure mode forces Qdrant.
- Filter benchmarks vary by query shape. "pgvector degrades to 25ms under filter" is true for selective filters that break HNSW locality; partial indexes or
iterative scan (pgvector 0.8+) mitigate this. Re-benchmark on your actual filter distribution before deciding.
Recommendation for a 5M-vector RAG in 2026
- Default: pgvector 0.9 + HNSW + halfvec on managed Postgres if Postgres is in stack and filters are simple. Cheapest, simplest, transactional.
- Switch to Qdrant if any of these are true: heavy per-query metadata filtering, multi-tenant ACL filters, sub-10ms p99 SLO, RAM-constrained host, or projected growth to 20M+ within 12 months.
- If pgvector starts hurting before you hit Qdrant, try pgvectorscale first — keeps the single-stack advantage.
Sources