Date: 2026-05-24 Type: Research Status: Tier-S comparison — pgvector (PostgreSQL) vs Qdrant for a 5 million vector RAG deployment Sources: pgvector-vs-qdrant-5m-vector-rag-2026-05-24.sources.json
For 5M vectors, either option works — and works well. The choice is an operations decision, not a performance decision (Afnexis, 2026). Both deliver sub-100ms query latency at 99% recall even at 10x your scale (50M vectors). The tiebreaker is your existing stack, team skills, and growth trajectory:
The most rigorous head-to-head data comes from Tiger Data's 50M vector (768-dim Cohere embeddings) benchmark using ANN-benchmarks on identical AWS r6id.4xlarge (16 vCPU, 128GB RAM) hardware:
| Metric | Postgres + pgvector + pgvectorscale | Qdrant | Δ |
|---|---|---|---|
| p50 | 31.07 ms | 30.75 ms | Qdrant 1% faster |
| p95 | 60.42 ms | 36.73 ms | Qdrant 39% faster |
| p99 | 74.60 ms | 38.71 ms | Qdrant 48% faster |
Both are sub-100ms. Qdrant's variance is tighter — critical if you have tail-latency SLAs.
Source: Tiger Data, Apr 2025 — 50M vectors, 768-dim, binary quantization on both
| System | QPS | Ratio |
|---|---|---|
| Postgres + pgvectorscale | 471.57 | 11.4× Qdrant |
| Qdrant | 41.47 | baseline |
This is the counter-intuitive result: Postgres handles far more concurrent queries on a single node. Tiger Data attributes this to Postgres's mature concurrency control vs. Qdrant's relative immaturity in concurrent read optimization.
Source: Tiger Data; confirmed by LetsDataScience
| Metric | Postgres + pgvectorscale | Qdrant |
|---|---|---|
| p50 latency | 9.54 ms | 4.74 ms |
| p99 latency | 15.73 ms | 5.79 ms |
| QPS | 1,590 | 361 |
At 90% recall, Qdrant wins latency (roughly 2× faster) but Postgres still dominates throughput (4.4× more QPS).
| System | Build Time (50M vectors) |
|---|---|
| Qdrant | ~3.3 hours |
| Postgres + pgvectorscale | ~11.1 hours |
Qdrant's multi-threaded Rust indexer is 3.4× faster. pgvectorscale's indexer is currently single-threaded (parallel builds are in development).
At 5M (10× smaller), expect: - Latency: Roughly 2-5ms for both systems at 99% recall (sub-linear scaling) - QPS: 2,000-5,000+ QPS for both (more than any typical RAG app needs) - Index build: ~20-40 minutes for Qdrant, ~1 hour for pgvectorscale - Index rebuild: Minutes, not hours — both are manageable
| Index | Best For | Trade-offs |
|---|---|---|
| HNSW (pgvector) | General-purpose, high recall | Must fit in RAM; memory-hungry at 5M+ |
| IVFFlat (pgvector) | Fast builds, low accuracy | Degrades as data changes; legacy choice |
| StreamingDiskANN (pgvectorscale) | Large-scale, cost-efficient | Keeps graph on NVMe, small RAM footprint; newer, fewer battle scars |
For 5M vectors at 768-dim, HNSW needs ~30-35GB RAM (raw vectors ≈ 15GB + graph overhead ≈ 15-20GB). StreamingDiskANN can cut this to 8-16GB by offloading to SSD.
Source: Instaclustr, Feb 2026; Tiger Data
| Feature | Details |
|---|---|
| Optimized HNSW | ACORN algorithm for filtered search; SIMD-accelerated in Rust |
| Quantization | Binary (BQ), Scalar (SQ), Product (PQ) — native, multi-level |
| On-disk storage | Vectors on disk, graph in RAM — 3-5× less RAM than pgvector HNSW |
| GPU acceleration | Available for index builds (late 2025+) |
At 5M vectors with BQ, Qdrant can operate in 4-8GB RAM vs pgvector's 32-64GB for full HNSW.
Source: Elestio, Apr 2026; Gemini research synthesis
This is where the two diverge meaningfully:
| Aspect | pgvector | Qdrant |
|---|---|---|
| Filter syntax | SQL WHERE clauses |
Dedicated payload filter API |
| Filter approach | Postgres planner uses B-tree indexes first, then HNSW on subset | Filters applied during HNSW traversal via bitmasks |
| Complex filters | JOINs, subqueries, full SQL expressiveness |
Structured must/should/must_not conditions |
| Historical issue | "Over-filtering" problem (HNSW skips valid results) | Purpose-built for filter-during-traversal |
| 2025-2026 fix | pgvector 0.8.0+ Iterative Scan improved filtered perf ~9× | N/A (was already good) |
| Best for | Relational queries, JOINs, SQL-native teams | Complex multi-attribute filters, real-time conditional search |
Verdict: For simple RAG (filter by doc type + tenant ID), both are fine. For complex multi-conditional metadata search, Qdrant is architecturally superior.
Source: Elestio, 2026; Tiger Data
| Resource | pgvector (HNSW) | pgvector (StreamingDiskANN) | Qdrant (BQ, on-disk) |
|---|---|---|---|
| Raw vector data | ~15 GB | ~15 GB (on SSD) | ~15 GB (on SSD) |
| Index in RAM | ~20 GB | ~2-4 GB | ~2-4 GB |
| Total RAM needed | 32-64 GB | 16-32 GB | 4-8 GB |
| SSD | Nice to have | Required (NVMe) | Required (NVMe) |
| Typical instance | r6i.2xlarge | r6i.2xlarge | c6i.xlarge |
Key insight: At 5M vectors, RAM is the primary cost driver. Qdrant's quantization + on-disk approach means you can run on a much smaller instance.
Source: Elestio, 2026; Gemini research; Second Talent, 2026
| Aspect | pgvector | Qdrant |
|---|---|---|
| Deployment | CREATE EXTENSION pgvector — done |
Separate Docker/K8s service |
| Existing stack | No new infra if you have Postgres | New DB, new SDK, new monitoring |
| Backups | pg_dump — unified with app data |
Separate backup strategy |
| Transactions | Full ACID — vectors + metadata in one tx | Eventual consistency for vector writes |
| Data sync | None needed — it's one database | Must maintain ETL pipeline from source DB |
| Observability | pg_stat_statements, existing PG tooling |
Qdrant dashboard + Prometheus metrics |
| Scaling | Scale-up (bigger instance); Citus for sharding | Built-in horizontal sharding |
| Team knowledge | Any Postgres DBA can run it | New operational domain to learn |
| Replication | Native streaming replication | Raft-based consensus (3+ nodes for HA) |
Verdict: pgvector wins operational simplicity by a wide margin. Qdrant adds a second database to your stack with its own failure modes, backup cadence, and on-call burden.
Source: Elestio, 2026; Tiger Data
| Cost Component | pgvector | Qdrant |
|---|---|---|
| Compute (AWS) | r6i.2xlarge ≈ $400/mo | c6i.xlarge ≈ $170/mo |
| Engineering overhead | Minimal (existing PG expertise) | 5-10 hrs/mo maintaining second DB |
| ETL/sync pipeline | $0 | Development + maintenance cost |
| Monitoring | Existing PG tools | Additional Prometheus/Grafana |
| True TCO estimate | $400-500/mo | $500-800/mo (including labor) |
Managed cloud: - Postgres (RDS/Aurora) with pgvector: $140-220/mo - Qdrant Cloud: $90-180/mo
The managed Qdrant pricing is cheaper, but self-hosted pgvector wins on TCO because you're not adding infrastructure.
Source: Gemini research; Tiger Data ($835/mo for 50M on r6id.4xlarge)
| Dimension | pgvector (Postgres) | Qdrant | Winner (at 5M) |
|---|---|---|---|
| Single-query latency | 5-30ms | 3-15ms | 🏆 Qdrant |
| Concurrent throughput (QPS) | 1,500-4,700+ | 300-400+ | 🏆 pgvector |
| Index build speed | Slower (serial) | Faster (parallel) | 🏆 Qdrant |
| RAM efficiency | 32-64 GB | 4-8 GB | 🏆 Qdrant |
| Metadata filtering | Good (SQL) | Best (native payload indexes) | 🏆 Qdrant |
| Operational simplicity | Minimal (extension) | Medium (separate service) | 🏆 pgvector |
| ACID transactions | Full support | Limited | 🏆 pgvector |
| SQL/JOIN support | Native | None | 🏆 pgvector |
| Horizontal scaling | Requires Citus | Built-in sharding | 🏆 Qdrant |
| TCO (self-hosted) | Lower | Higher (labor costs) | 🏆 pgvector |
| Ecosystem maturity | 30+ years (PG) | 4 years | 🏆 pgvector |
Score: pgvector 5 — Qdrant 6 (on raw technical merit at this scale, they're extremely close; the "winner" depends on your specific constraints)
"pgvector is all you need" is a vendor narrative. Tiger Data (formerly Timescale) makes pgvectorscale. Their benchmark shows Postgres winning on throughput, but they note they "had trouble finding the right parameters for Qdrant" and spent weeks iterating. A Qdrant-optimized benchmark might tell a different story. (Tiger Data)
The NirantK 1M benchmark tells the opposite story. At 1M vectors (2023), pgvector lagged Qdrant by 15× in throughput with worse accuracy. pgvector has improved dramatically since then (HNSW in 0.7.0, iterative scan in 0.8.0, StreamingDiskANN via pgvectorscale), but the gap at small-to-medium scale was real and may still exist without pgvectorscale. (NirantK, 2023)
5M vectors is the "sweet spot where either works." At 5M, you're below the threshold where Qdrant's architectural advantages (on-disk, horizontal sharding) become decisive. If you're at 5M today but heading to 50M, starting with Qdrant avoids a painful migration later. (Elestio, 2026)
No credible counterpoints found suggesting pgvector's HNSW is faster than Qdrant's HNSW on single-query latency — Qdrant consistently wins that metric across all benchmarks.
For a 5M-vector RAG system in 2026:
Start with pgvector if you already have Postgres. You'll be operational in hours, not days. The performance is more than sufficient for RAG (sub-30ms queries, thousands of QPS). You can always migrate to Qdrant later if you hit scale or latency ceilings — and the SQL-first approach makes that migration straightforward since your data is already relational.
Start with Qdrant only if you have a clear, measurable requirement that pgvector can't meet: sub-10ms p99 SLAs, complex multi-attribute filtering at high QPS, or a growth plan that takes you past 20M vectors within 18 months. In those cases, Qdrant's purpose-built architecture will save you from an eventual migration.
Research conducted 2026-05-24 using DuckDuckGo, Gemini CLI, and direct source scraping. Sources tracked in companion .sources.json file.