All articles
· 7 min deep-divesearchinformation-retrievalsystems
Article 1 in your session

IVF Index — Partitioning Vector Space with K-Means for Fast Search

A visual deep dive into the Inverted File Index (IVF). How k-means clustering partitions billion-scale vector collections into searchable regions, the nprobe recall-speed knob, and IVF complexity analysis.

Introduction 0%
Introduction
🎯 0/3 0%

📂

1 billion vectors.
Search only 0.1% of them.

IVF (Inverted File Index) applies k-means clustering to partition vector space. At query time, you search only the nearest clusters — scanning ~0.1% of all vectors while maintaining 95%+ recall. It’s the same “inverted file” idea from text search, applied to vector space.

↓ Scroll to understand how k-means enables 1000× search speedup

IVF Concept

IVF: Inverted File Index for Vectors

IVF: Partition → Probe → Search1. Offline: PartitionCCCK clusters via k-means2. Query: Probe nprobeQd=0.3d=0.5d=1.2Find closest nprobe centroidsnprobe=2: search top 2 clusters3. Scan SubsetTotal vectors: 1BK = 10,000 clustersnprobe = 10Scan 1M vectors(0.1% of total)Speedup: 1000×Recall: ~95%↑ nprobe → ↑ recall, ↑ latency
IVF partitions vector space into Voronoi cells using k-means centroids. At query time, only nearby cells are searched.
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

In IVF, what determines which vectors are searched at query time?

IVF Math

IVF complexity analysis

🏗️ Index Building

K-means clustering runs over all N vectors to find K centroids. Each iteration assigns vectors to nearest centroids and recomputes means, typically taking 20-50 iterations.

O(N × K × I), I = k-means iterations
💾 Space

Stores all N original vectors plus K centroid vectors. Since K is much smaller than N, the centroid overhead is negligible — space is dominated by the vectors themselves.

O(N × d + K × d)
🔍 Query Time

First compares the query to all K centroids (cheap), then exhaustively scans nprobe clusters. The two costs are balanced when K ≈ √N.

O(K × d + nprobe × (N/K) × d)
⚖️ Optimal K = √N

Setting K to the square root of N equalizes centroid comparison cost and cluster scan cost, yielding the best balance of speed and recall.

Query becomes O(√N × d)
📊 Scale Example

With 1 billion vectors, K=31,623 and nprobe=10, only ~316K vectors are scanned per query — 99.97% of the database is skipped. Typical GPU latency: 1-5ms.

↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You have 100M vectors indexed with IVF (K=1000 clusters). You set nprobe=1 and get 85% recall. How do you improve recall WITHOUT rebuilding the index?

↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You're building an IVF index for 100M vectors. You set K=100 clusters. What happens?

🎓 What You Now Know

IVF partitions space with k-means — cluster all vectors offline, then search only nearby clusters at query time.

nprobe is the recall-speed knob — adjustable at query time without rebuilding the index. Sweet spot: nprobe ≈ √K.

Query complexity is O(√N × d) — when K = √N, IVF balances centroid search and cluster scan cost.

IVF-PQ adds 32× compression — Product Quantization reduces memory from terabytes to gigabytes with ~5% recall loss.

IVF is effective but HNSW often achieves higher recall at the same latency — at the cost of more memory. ⚡

Keep Learning