⌂ Home ☷ Board

pgvector vs Qdrant for a 5M-Vector RAG System

Date: 2026-05-24 Type: Research Status: Tier-S comparison — pgvector (PostgreSQL) vs Qdrant for a 5 million vector RAG deployment Sources: pgvector-vs-qdrant-5m-vector-rag-2026-05-24.sources.json


TL;DR

For 5M vectors, either option works — and works well. The choice is an operations decision, not a performance decision (Afnexis, 2026). Both deliver sub-100ms query latency at 99% recall even at 10x your scale (50M vectors). The tiebreaker is your existing stack, team skills, and growth trajectory:


1. Benchmark Numbers at Scale

The most rigorous head-to-head data comes from Tiger Data's 50M vector (768-dim Cohere embeddings) benchmark using ANN-benchmarks on identical AWS r6id.4xlarge (16 vCPU, 128GB RAM) hardware:

Query Latency (99% Recall)

Metric Postgres + pgvector + pgvectorscale Qdrant Δ
p50 31.07 ms 30.75 ms Qdrant 1% faster
p95 60.42 ms 36.73 ms Qdrant 39% faster
p99 74.60 ms 38.71 ms Qdrant 48% faster

Both are sub-100ms. Qdrant's variance is tighter — critical if you have tail-latency SLAs.

Source: Tiger Data, Apr 2025 — 50M vectors, 768-dim, binary quantization on both

Query Throughput (QPS) at 99% Recall

System QPS Ratio
Postgres + pgvectorscale 471.57 11.4× Qdrant
Qdrant 41.47 baseline

This is the counter-intuitive result: Postgres handles far more concurrent queries on a single node. Tiger Data attributes this to Postgres's mature concurrency control vs. Qdrant's relative immaturity in concurrent read optimization.

Source: Tiger Data; confirmed by LetsDataScience

At Lower Recall (90%)

Metric Postgres + pgvectorscale Qdrant
p50 latency 9.54 ms 4.74 ms
p99 latency 15.73 ms 5.79 ms
QPS 1,590 361

At 90% recall, Qdrant wins latency (roughly 2× faster) but Postgres still dominates throughput (4.4× more QPS).

Index Build Time

System Build Time (50M vectors)
Qdrant ~3.3 hours
Postgres + pgvectorscale ~11.1 hours

Qdrant's multi-threaded Rust indexer is 3.4× faster. pgvectorscale's indexer is currently single-threaded (parallel builds are in development).

Extrapolating to 5M Vectors

At 5M (10× smaller), expect: - Latency: Roughly 2-5ms for both systems at 99% recall (sub-linear scaling) - QPS: 2,000-5,000+ QPS for both (more than any typical RAG app needs) - Index build: ~20-40 minutes for Qdrant, ~1 hour for pgvectorscale - Index rebuild: Minutes, not hours — both are manageable


2. Index Types and Algorithms

pgvector / pgvectorscale

Index Best For Trade-offs
HNSW (pgvector) General-purpose, high recall Must fit in RAM; memory-hungry at 5M+
IVFFlat (pgvector) Fast builds, low accuracy Degrades as data changes; legacy choice
StreamingDiskANN (pgvectorscale) Large-scale, cost-efficient Keeps graph on NVMe, small RAM footprint; newer, fewer battle scars

For 5M vectors at 768-dim, HNSW needs ~30-35GB RAM (raw vectors ≈ 15GB + graph overhead ≈ 15-20GB). StreamingDiskANN can cut this to 8-16GB by offloading to SSD.

Source: Instaclustr, Feb 2026; Tiger Data

Qdrant

Feature Details
Optimized HNSW ACORN algorithm for filtered search; SIMD-accelerated in Rust
Quantization Binary (BQ), Scalar (SQ), Product (PQ) — native, multi-level
On-disk storage Vectors on disk, graph in RAM — 3-5× less RAM than pgvector HNSW
GPU acceleration Available for index builds (late 2025+)

At 5M vectors with BQ, Qdrant can operate in 4-8GB RAM vs pgvector's 32-64GB for full HNSW.

Source: Elestio, Apr 2026; Gemini research synthesis


3. Metadata Filtering

This is where the two diverge meaningfully:

Aspect pgvector Qdrant
Filter syntax SQL WHERE clauses Dedicated payload filter API
Filter approach Postgres planner uses B-tree indexes first, then HNSW on subset Filters applied during HNSW traversal via bitmasks
Complex filters JOINs, subqueries, full SQL expressiveness Structured must/should/must_not conditions
Historical issue "Over-filtering" problem (HNSW skips valid results) Purpose-built for filter-during-traversal
2025-2026 fix pgvector 0.8.0+ Iterative Scan improved filtered perf ~9× N/A (was already good)
Best for Relational queries, JOINs, SQL-native teams Complex multi-attribute filters, real-time conditional search

Verdict: For simple RAG (filter by doc type + tenant ID), both are fine. For complex multi-conditional metadata search, Qdrant is architecturally superior.

Source: Elestio, 2026; Tiger Data


4. Memory & Hardware Requirements

5M Vectors, 768 Dimensions (OpenAI/Cohere-scale)

Resource pgvector (HNSW) pgvector (StreamingDiskANN) Qdrant (BQ, on-disk)
Raw vector data ~15 GB ~15 GB (on SSD) ~15 GB (on SSD)
Index in RAM ~20 GB ~2-4 GB ~2-4 GB
Total RAM needed 32-64 GB 16-32 GB 4-8 GB
SSD Nice to have Required (NVMe) Required (NVMe)
Typical instance r6i.2xlarge r6i.2xlarge c6i.xlarge

Key insight: At 5M vectors, RAM is the primary cost driver. Qdrant's quantization + on-disk approach means you can run on a much smaller instance.

Source: Elestio, 2026; Gemini research; Second Talent, 2026


5. Operational Complexity

Aspect pgvector Qdrant
Deployment CREATE EXTENSION pgvector — done Separate Docker/K8s service
Existing stack No new infra if you have Postgres New DB, new SDK, new monitoring
Backups pg_dump — unified with app data Separate backup strategy
Transactions Full ACID — vectors + metadata in one tx Eventual consistency for vector writes
Data sync None needed — it's one database Must maintain ETL pipeline from source DB
Observability pg_stat_statements, existing PG tooling Qdrant dashboard + Prometheus metrics
Scaling Scale-up (bigger instance); Citus for sharding Built-in horizontal sharding
Team knowledge Any Postgres DBA can run it New operational domain to learn
Replication Native streaming replication Raft-based consensus (3+ nodes for HA)

Verdict: pgvector wins operational simplicity by a wide margin. Qdrant adds a second database to your stack with its own failure modes, backup cadence, and on-call burden.

Source: Elestio, 2026; Tiger Data


6. Cost of Ownership (Monthly Estimate, Self-Hosted)

5M Vectors, 768-dim, 100 QPS read workload

Cost Component pgvector Qdrant
Compute (AWS) r6i.2xlarge ≈ $400/mo c6i.xlarge ≈ $170/mo
Engineering overhead Minimal (existing PG expertise) 5-10 hrs/mo maintaining second DB
ETL/sync pipeline $0 Development + maintenance cost
Monitoring Existing PG tools Additional Prometheus/Grafana
True TCO estimate $400-500/mo $500-800/mo (including labor)

Managed cloud: - Postgres (RDS/Aurora) with pgvector: $140-220/mo - Qdrant Cloud: $90-180/mo

The managed Qdrant pricing is cheaper, but self-hosted pgvector wins on TCO because you're not adding infrastructure.

Source: Gemini research; Tiger Data ($835/mo for 50M on r6id.4xlarge)


7. When to Choose Each

Choose pgvector when:

Choose Qdrant when:


8. Head-to-Head Summary

Dimension pgvector (Postgres) Qdrant Winner (at 5M)
Single-query latency 5-30ms 3-15ms 🏆 Qdrant
Concurrent throughput (QPS) 1,500-4,700+ 300-400+ 🏆 pgvector
Index build speed Slower (serial) Faster (parallel) 🏆 Qdrant
RAM efficiency 32-64 GB 4-8 GB 🏆 Qdrant
Metadata filtering Good (SQL) Best (native payload indexes) 🏆 Qdrant
Operational simplicity Minimal (extension) Medium (separate service) 🏆 pgvector
ACID transactions Full support Limited 🏆 pgvector
SQL/JOIN support Native None 🏆 pgvector
Horizontal scaling Requires Citus Built-in sharding 🏆 Qdrant
TCO (self-hosted) Lower Higher (labor costs) 🏆 pgvector
Ecosystem maturity 30+ years (PG) 4 years 🏆 pgvector

Score: pgvector 5 — Qdrant 6 (on raw technical merit at this scale, they're extremely close; the "winner" depends on your specific constraints)


Counterpoints

  1. "pgvector is all you need" is a vendor narrative. Tiger Data (formerly Timescale) makes pgvectorscale. Their benchmark shows Postgres winning on throughput, but they note they "had trouble finding the right parameters for Qdrant" and spent weeks iterating. A Qdrant-optimized benchmark might tell a different story. (Tiger Data)

  2. The NirantK 1M benchmark tells the opposite story. At 1M vectors (2023), pgvector lagged Qdrant by 15× in throughput with worse accuracy. pgvector has improved dramatically since then (HNSW in 0.7.0, iterative scan in 0.8.0, StreamingDiskANN via pgvectorscale), but the gap at small-to-medium scale was real and may still exist without pgvectorscale. (NirantK, 2023)

  3. 5M vectors is the "sweet spot where either works." At 5M, you're below the threshold where Qdrant's architectural advantages (on-disk, horizontal sharding) become decisive. If you're at 5M today but heading to 50M, starting with Qdrant avoids a painful migration later. (Elestio, 2026)

  4. No credible counterpoints found suggesting pgvector's HNSW is faster than Qdrant's HNSW on single-query latency — Qdrant consistently wins that metric across all benchmarks.


Recommendation

For a 5M-vector RAG system in 2026:

Start with pgvector if you already have Postgres. You'll be operational in hours, not days. The performance is more than sufficient for RAG (sub-30ms queries, thousands of QPS). You can always migrate to Qdrant later if you hit scale or latency ceilings — and the SQL-first approach makes that migration straightforward since your data is already relational.

Start with Qdrant only if you have a clear, measurable requirement that pgvector can't meet: sub-10ms p99 SLAs, complex multi-attribute filtering at high QPS, or a growth plan that takes you past 20M vectors within 18 months. In those cases, Qdrant's purpose-built architecture will save you from an eventual migration.


Research conducted 2026-05-24 using DuckDuckGo, Gemini CLI, and direct source scraping. Sources tracked in companion .sources.json file.