IVF Index — Partitioning Vector Space with K-Means for Fast Search
A visual deep dive into the Inverted File Index (IVF). How k-means clustering partitions billion-scale vector collections into searchable regions, the nprobe recall-speed knob, and IVF complexity analysis.
📂
1 billion vectors.
Search only 0.1% of them.
IVF (Inverted File Index) applies k-means clustering to partition vector space. At query time, you search only the nearest clusters — scanning ~0.1% of all vectors while maintaining 95%+ recall. It’s the same “inverted file” idea from text search, applied to vector space.
↓ Scroll to understand how k-means enables 1000× search speedup
IVF: Inverted File Index for Vectors
In IVF, what determines which vectors are searched at query time?
💡 Think about what nprobe controls in the IVF pipeline.
IVF's core insight is space partitioning. During indexing, k-means assigns every vector to its nearest centroid, creating K clusters (Voronoi cells). At query time, you compute distance from the query to all K centroids (cheap — K is small, typically √N), pick the closest nprobe centroids, and exhaustively search only those clusters. With K=10,000 and nprobe=10, you search only 0.1% of vectors. The key trade-off: vectors near cluster BOUNDARIES might belong to a neighboring cluster you didn't probe — this is why nprobe>1 improves recall.
IVF complexity analysis
K-means clustering runs over all N vectors to find K centroids. Each iteration assigns vectors to nearest centroids and recomputes means, typically taking 20-50 iterations.
O(N × K × I), I = k-means iterations Stores all N original vectors plus K centroid vectors. Since K is much smaller than N, the centroid overhead is negligible — space is dominated by the vectors themselves.
O(N × d + K × d) First compares the query to all K centroids (cheap), then exhaustively scans nprobe clusters. The two costs are balanced when K ≈ √N.
O(K × d + nprobe × (N/K) × d) Setting K to the square root of N equalizes centroid comparison cost and cluster scan cost, yielding the best balance of speed and recall.
Query becomes O(√N × d) With 1 billion vectors, K=31,623 and nprobe=10, only ~316K vectors are scanned per query — 99.97% of the database is skipped. Typical GPU latency: 1-5ms.
You have 100M vectors indexed with IVF (K=1000 clusters). You set nprobe=1 and get 85% recall. How do you improve recall WITHOUT rebuilding the index?
💡 IVF has a parameter that controls how many clusters to search per query...
nprobe is IVF's recall-speed knob. nprobe=1 searches only the single closest cluster — fast but misses vectors near cluster boundaries. nprobe=10 searches 10 clusters, covering 10× more vectors and catching boundary cases. The trade-off is linear: nprobe=10 takes ~10× longer than nprobe=1. In practice, the sweet spot is usually nprobe ≈ √K. For K=1000, that's nprobe ≈ 32, which typically gives 98%+ recall. The beauty of IVF is that this knob can be tuned at query time — no index rebuilding needed.
You're building an IVF index for 100M vectors. You set K=100 clusters. What happens?
💡 Calculate how many vectors per cluster with K=100 and N=100M.
K controls the granularity of partitioning. With K=100 and N=100M, each cluster holds ~1M vectors. Searching nprobe=10 clusters means scanning 10M vectors — 10% of the database. That's a 10× speedup, not 1000×. The optimal K ≈ √N balances centroid search cost and cluster scan cost: √(100M) ≈ 10,000. With K=10,000, each cluster holds ~10,000 vectors, and nprobe=10 scans only 100,000 vectors (0.1%). Rule of thumb: K should be between √N and 4√N.
🎓 What You Now Know
✓ IVF partitions space with k-means — cluster all vectors offline, then search only nearby clusters at query time.
✓ nprobe is the recall-speed knob — adjustable at query time without rebuilding the index. Sweet spot: nprobe ≈ √K.
✓ Query complexity is O(√N × d) — when K = √N, IVF balances centroid search and cluster scan cost.
✓ IVF-PQ adds 32× compression — Product Quantization reduces memory from terabytes to gigabytes with ~5% recall loss.
IVF is effective but HNSW often achieves higher recall at the same latency — at the cost of more memory. ⚡
↗ Keep Learning
Approximate Nearest Neighbor Search — Trading 1% Accuracy for 1000× Speed
A visual deep dive into ANN search. Why brute-force nearest neighbor fails at scale, how approximate methods achieve 99% recall with logarithmic query time, and the fundamental accuracy-speed tradeoff behind every vector search system.
HNSW — Hierarchical Navigable Small World Graphs for Vector Search
A visual deep dive into HNSW, the dominant graph-based index for vector search. Understand the multi-layer graph structure, the M and ef parameters, complexity analysis, and how it compares to IVF, LSH, and brute force.
Vector Databases — Search by Meaning, Not Keywords
A visual deep dive into vector databases. From embeddings to ANN search to HNSW — understand how AI-powered search finds what you actually mean, not just what you typed.
K-Means Clustering — Grouping Data Without Labels
A scroll-driven visual deep dive into K-Means clustering. Learn the iterative algorithm, choosing K with the elbow method, limitations, and when to use alternatives.
Comments
No comments yet. Be the first!