pgvector vs Qdrant for a 5M-Vector RAG System

Date: 2026-05-24 Type: Research Status: Tier-S comparison — pgvector (PostgreSQL) vs Qdrant for a 5 million vector RAG deployment Sources: pgvector-vs-qdrant-5m-vector-rag-2026-05-24.sources.json

TL;DR

For 5M vectors, either option works — and works well. The choice is an operations decision, not a performance decision (Afnexis, 2026). Both deliver sub-100ms query latency at 99% recall even at 10x your scale (50M vectors). The tiebreaker is your existing stack, team skills, and growth trajectory:

Pick pgvector if you already run Postgres, value ACID transactions, want one database, and your RAG workload is multi-tenant or read-heavy with moderate concurrency.
Pick Qdrant if you need sub-10ms tail latency SLAs, complex metadata filtering at scale, aggressive RAM compression, or plan to grow well beyond 5M vectors.

1. Benchmark Numbers at Scale

The most rigorous head-to-head data comes from Tiger Data's 50M vector (768-dim Cohere embeddings) benchmark using ANN-benchmarks on identical AWS r6id.4xlarge (16 vCPU, 128GB RAM) hardware:

Query Latency (99% Recall)

Metric	Postgres + pgvector + pgvectorscale	Qdrant	Δ
p50	31.07 ms	30.75 ms	Qdrant 1% faster
p95	60.42 ms	36.73 ms	Qdrant 39% faster
p99	74.60 ms	38.71 ms	Qdrant 48% faster

Both are sub-100ms. Qdrant's variance is tighter — critical if you have tail-latency SLAs.

Source: Tiger Data, Apr 2025 — 50M vectors, 768-dim, binary quantization on both

Query Throughput (QPS) at 99% Recall

System	QPS	Ratio
Postgres + pgvectorscale	471.57	11.4× Qdrant
Qdrant	41.47	baseline

This is the counter-intuitive result: Postgres handles far more concurrent queries on a single node. Tiger Data attributes this to Postgres's mature concurrency control vs. Qdrant's relative immaturity in concurrent read optimization.

Source: Tiger Data; confirmed by LetsDataScience

At Lower Recall (90%)

Metric	Postgres + pgvectorscale	Qdrant
p50 latency	9.54 ms	4.74 ms
p99 latency	15.73 ms	5.79 ms
QPS	1,590	361

At 90% recall, Qdrant wins latency (roughly 2× faster) but Postgres still dominates throughput (4.4× more QPS).

Index Build Time

System	Build Time (50M vectors)
Qdrant	~3.3 hours
Postgres + pgvectorscale	~11.1 hours

Qdrant's multi-threaded Rust indexer is 3.4× faster. pgvectorscale's indexer is currently single-threaded (parallel builds are in development).

Extrapolating to 5M Vectors

At 5M (10× smaller), expect: - Latency: Roughly 2-5ms for both systems at 99% recall (sub-linear scaling) - QPS: 2,000-5,000+ QPS for both (more than any typical RAG app needs) - Index build: ~20-40 minutes for Qdrant, ~1 hour for pgvectorscale - Index rebuild: Minutes, not hours — both are manageable

2. Index Types and Algorithms

pgvector / pgvectorscale

Index	Best For	Trade-offs
HNSW (pgvector)	General-purpose, high recall	Must fit in RAM; memory-hungry at 5M+
IVFFlat (pgvector)	Fast builds, low accuracy	Degrades as data changes; legacy choice
StreamingDiskANN (pgvectorscale)	Large-scale, cost-efficient	Keeps graph on NVMe, small RAM footprint; newer, fewer battle scars

For 5M vectors at 768-dim, HNSW needs ~30-35GB RAM (raw vectors ≈ 15GB + graph overhead ≈ 15-20GB). StreamingDiskANN can cut this to 8-16GB by offloading to SSD.

Source: Instaclustr, Feb 2026; Tiger Data

Qdrant

Feature	Details
Optimized HNSW	ACORN algorithm for filtered search; SIMD-accelerated in Rust
Quantization	Binary (BQ), Scalar (SQ), Product (PQ) — native, multi-level
On-disk storage	Vectors on disk, graph in RAM — 3-5× less RAM than pgvector HNSW
GPU acceleration	Available for index builds (late 2025+)

At 5M vectors with BQ, Qdrant can operate in 4-8GB RAM vs pgvector's 32-64GB for full HNSW.

Source: Elestio, Apr 2026; Gemini research synthesis

3. Metadata Filtering

This is where the two diverge meaningfully:

Aspect	pgvector	Qdrant
Filter syntax	SQL `WHERE` clauses	Dedicated payload filter API
Filter approach	Postgres planner uses B-tree indexes first, then HNSW on subset	Filters applied during HNSW traversal via bitmasks
Complex filters	`JOIN`s, subqueries, full SQL expressiveness	Structured `must`/`should`/`must_not` conditions
Historical issue	"Over-filtering" problem (HNSW skips valid results)	Purpose-built for filter-during-traversal
2025-2026 fix	pgvector 0.8.0+ Iterative Scan improved filtered perf ~9×	N/A (was already good)
Best for	Relational queries, JOINs, SQL-native teams	Complex multi-attribute filters, real-time conditional search

Verdict: For simple RAG (filter by doc type + tenant ID), both are fine. For complex multi-conditional metadata search, Qdrant is architecturally superior.

Source: Elestio, 2026; Tiger Data

4. Memory & Hardware Requirements

5M Vectors, 768 Dimensions (OpenAI/Cohere-scale)

Resource	pgvector (HNSW)	pgvector (StreamingDiskANN)	Qdrant (BQ, on-disk)
Raw vector data	~15 GB	~15 GB (on SSD)	~15 GB (on SSD)
Index in RAM	~20 GB	~2-4 GB	~2-4 GB
Total RAM needed	32-64 GB	16-32 GB	4-8 GB
SSD	Nice to have	Required (NVMe)	Required (NVMe)
Typical instance	r6i.2xlarge	r6i.2xlarge	c6i.xlarge

Key insight: At 5M vectors, RAM is the primary cost driver. Qdrant's quantization + on-disk approach means you can run on a much smaller instance.

Source: Elestio, 2026; Gemini research; Second Talent, 2026

5. Operational Complexity

Aspect	pgvector	Qdrant
Deployment	`CREATE EXTENSION pgvector` — done	Separate Docker/K8s service
Existing stack	No new infra if you have Postgres	New DB, new SDK, new monitoring
Backups	`pg_dump` — unified with app data	Separate backup strategy
Transactions	Full ACID — vectors + metadata in one tx	Eventual consistency for vector writes
Data sync	None needed — it's one database	Must maintain ETL pipeline from source DB
Observability	`pg_stat_statements`, existing PG tooling	Qdrant dashboard + Prometheus metrics
Scaling	Scale-up (bigger instance); Citus for sharding	Built-in horizontal sharding
Team knowledge	Any Postgres DBA can run it	New operational domain to learn
Replication	Native streaming replication	Raft-based consensus (3+ nodes for HA)

Verdict: pgvector wins operational simplicity by a wide margin. Qdrant adds a second database to your stack with its own failure modes, backup cadence, and on-call burden.

Source: Elestio, 2026; Tiger Data

6. Cost of Ownership (Monthly Estimate, Self-Hosted)

5M Vectors, 768-dim, 100 QPS read workload

Cost Component	pgvector	Qdrant
Compute (AWS)	r6i.2xlarge ≈ $400/mo	c6i.xlarge ≈ $170/mo
Engineering overhead	Minimal (existing PG expertise)	5-10 hrs/mo maintaining second DB
ETL/sync pipeline	$0	Development + maintenance cost
Monitoring	Existing PG tools	Additional Prometheus/Grafana
True TCO estimate	$400-500/mo	$500-800/mo (including labor)

Managed cloud: - Postgres (RDS/Aurora) with pgvector: $140-220/mo - Qdrant Cloud: $90-180/mo

The managed Qdrant pricing is cheaper, but self-hosted pgvector wins on TCO because you're not adding infrastructure.

Source: Gemini research; Tiger Data ($835/mo for 50M on r6id.4xlarge)

7. When to Choose Each

Choose pgvector when:

✅ You already run Postgres in production
✅ Your RAG app is multi-tenant (most queries filtered by tenant)
✅ You need ACID transactions across vectors + metadata
✅ Your team has Postgres expertise but no vector-DB experience
✅ 5M is near your ceiling (not planning 50M+ soon)
✅ Operational simplicity matters more than shaving 5ms off p99

Choose Qdrant when:

✅ Sub-10ms tail latency SLAs are a hard requirement
✅ Complex metadata filtering is central to your product
✅ You plan to scale to 20M+ vectors within 12 months
✅ RAM budget is tight (can't afford 32GB+ instances)
✅ You need horizontal scaling across multiple nodes
✅ You're building a standalone search/recommendation service (not tightly coupled to a relational DB)
✅ Your team is comfortable running a separate vector infrastructure

8. Head-to-Head Summary

Dimension	pgvector (Postgres)	Qdrant	Winner (at 5M)
Single-query latency	5-30ms	3-15ms	🏆 Qdrant
Concurrent throughput (QPS)	1,500-4,700+	300-400+	🏆 pgvector
Index build speed	Slower (serial)	Faster (parallel)	🏆 Qdrant
RAM efficiency	32-64 GB	4-8 GB	🏆 Qdrant
Metadata filtering	Good (SQL)	Best (native payload indexes)	🏆 Qdrant
Operational simplicity	Minimal (extension)	Medium (separate service)	🏆 pgvector
ACID transactions	Full support	Limited	🏆 pgvector
SQL/JOIN support	Native	None	🏆 pgvector
Horizontal scaling	Requires Citus	Built-in sharding	🏆 Qdrant
TCO (self-hosted)	Lower	Higher (labor costs)	🏆 pgvector
Ecosystem maturity	30+ years (PG)	4 years	🏆 pgvector

Score: pgvector 5 — Qdrant 6 (on raw technical merit at this scale, they're extremely close; the "winner" depends on your specific constraints)

Counterpoints

"pgvector is all you need" is a vendor narrative. Tiger Data (formerly Timescale) makes pgvectorscale. Their benchmark shows Postgres winning on throughput, but they note they "had trouble finding the right parameters for Qdrant" and spent weeks iterating. A Qdrant-optimized benchmark might tell a different story. (Tiger Data)
The NirantK 1M benchmark tells the opposite story. At 1M vectors (2023), pgvector lagged Qdrant by 15× in throughput with worse accuracy. pgvector has improved dramatically since then (HNSW in 0.7.0, iterative scan in 0.8.0, StreamingDiskANN via pgvectorscale), but the gap at small-to-medium scale was real and may still exist without pgvectorscale. (NirantK, 2023)
5M vectors is the "sweet spot where either works." At 5M, you're below the threshold where Qdrant's architectural advantages (on-disk, horizontal sharding) become decisive. If you're at 5M today but heading to 50M, starting with Qdrant avoids a painful migration later. (Elestio, 2026)
No credible counterpoints found suggesting pgvector's HNSW is faster than Qdrant's HNSW on single-query latency — Qdrant consistently wins that metric across all benchmarks.

Recommendation

For a 5M-vector RAG system in 2026:

Start with pgvector if you already have Postgres. You'll be operational in hours, not days. The performance is more than sufficient for RAG (sub-30ms queries, thousands of QPS). You can always migrate to Qdrant later if you hit scale or latency ceilings — and the SQL-first approach makes that migration straightforward since your data is already relational.

Start with Qdrant only if you have a clear, measurable requirement that pgvector can't meet: sub-10ms p99 SLAs, complex multi-attribute filtering at high QPS, or a growth plan that takes you past 20M vectors within 18 months. In those cases, Qdrant's purpose-built architecture will save you from an eventual migration.

Research conducted 2026-05-24 using DuckDuckGo, Gemini CLI, and direct source scraping. Sources tracked in companion .sources.json file.