HNSW — Hierarchical Navigable Small World Graphs for Vector Search
A visual deep dive into HNSW, the dominant graph-based index for vector search. Understand the multi-layer graph structure, the M and ef parameters, complexity analysis, and how it compares to IVF, LSH, and brute force.
🕸️
A graph where every hop
gets you closer.
HNSW (Hierarchical Navigable Small World) builds a multi-layer graph where each vector is a node and edges connect nearby vectors. Searching means “hopping” through the graph — each hop brings you closer to the query. The result: O(log N) query time with the highest recall of any ANN algorithm. The tradeoff? It’s memory-hungry.
↓ Scroll to understand the graph-based index that powers most vector databases
HNSW: The Graph-Based Index
You're tuning HNSW for a latency-sensitive application. You set ef=5 (very low beam width). What problem will you likely encounter?
💡 Think about what happens in a greedy search when the best path requires going 'uphill' temporarily.
ef (exploration factor) controls how many candidates the search tracks simultaneously. With ef=5, the algorithm greedily follows the 5 closest nodes it's found so far. If the true nearest neighbor requires a 'detour' through a slightly farther node, the greedy search won't explore it. This is the local optima problem in graph search. Higher ef (e.g., 100-200) maintains a wider beam of candidates, allowing the search to explore multiple promising paths. The tradeoff: ef=5 might give 85% recall in 0.1ms, while ef=200 gives 99.5% recall in 5ms. For production, ef=64-128 is the typical sweet spot.
HNSW complexity
Each of N vectors is inserted into the graph, connecting to M neighbors at each layer. The hierarchical structure means insertion traverses O(log N) layers.
O(N × log(N) × M) Every node stores M edges at each layer, and the number of layers grows logarithmically with N. This makes HNSW memory-hungry — every edge must be explicitly stored.
O(N × M × number_of_layers) Search traverses log(N) layers, tracking ef candidates at each step, and checking M neighbors per candidate. The logarithmic layer traversal is what gives HNSW its speed.
O(log(N) × ef × M) At 1 billion vectors with M=16, the graph index alone requires roughly 2 TB of RAM. This is HNSW's main cost — every edge is an explicit pointer in memory.
N × M × 4 bytes × log(N) ≈ 2 TB For 100M vectors, typical query latency is 1-10ms — much faster than IVF at equivalent recall, but at the cost of roughly 4× more memory.
HNSW query time is O(log N × ef × M). If you double the database size from 500M to 1B vectors, how does query latency change?
💡 Calculate log₂(500M) and log₂(1B). How different are they?
This is the magic of logarithmic complexity. log₂(500M) ≈ 29, log₂(1B) ≈ 30. That's ONE additional layer to traverse out of 30 — about 3.4% more work. Compare this to IVF where query time is O(√N): doubling N increases query time by √2 ≈ 41%. Or brute force at O(N): doubling N doubles query time. HNSW's logarithmic scaling is why it dominates at large scale — going from 100M to 10B vectors only adds ~7 extra hops. The constant factors (ef, M) matter more than N for practical latency.
The Big Complexity Comparison
You have 500M 768-dim vectors. HNSW with M=32 achieves 99% recall but requires ~500GB RAM for the graph alone. You only have 64GB RAM. What do you do?
💡 Which index type is specifically designed to minimize memory usage?
IVF-PQ is designed for exactly this scenario. Product Quantization (PQ) splits each 768-dim vector into 96 sub-vectors of 8 dimensions each, then quantizes each sub-vector to 1 byte (256 codebook entries). Result: 768×4 bytes = 3072 bytes → 96 bytes per vector (32× compression). 500M × 96 bytes = 48GB — fits! The accuracy cost: PQ introduces quantization error, so recall drops from ~99% (HNSW) to ~92%. Option D (sharding) also works but requires 8 machines. IVF-PQ is the single-machine winner.
You’ve seen the memory–accuracy tradeoff at scale. Now let’s zoom into why HNSW’s hierarchy matters — the layered structure is the key insight that separates HNSW from a flat graph and gives it O(log N) search.
Why does HNSW use MULTIPLE layers instead of a single flat graph?
💡 Compare HNSW to a skip list. What does each layer add?
A flat Navigable Small World (NSW) graph requires O(√N) hops because you can only take short steps between nearby nodes. Adding hierarchy creates a 'zoom' effect: Layer 2 (~N/16 nodes) lets you jump across large distances with a few hops. Layer 1 (~N/4 nodes) refines the position. Layer 0 (all N nodes) does the final precise search. Each layer halves the search space, giving O(log N) total hops. This is the same principle as skip lists in databases — B-trees in storage systems — and express/local train lines in transit. The hierarchy is what makes HNSW the fastest ANN algorithm at high recall.
🎓 What You Now Know
✓ HNSW builds a navigable graph — each vector is a node, edges connect nearby vectors, search means hopping through the graph.
✓ M controls the graph density — more edges = higher recall but more memory. Sweet spot: M = 16-64.
✓ ef controls the search beam width — higher ef explores more paths for better recall at the cost of latency.
✓ O(log N) query time, but memory-hungry — HNSW gives the highest recall but requires storing the entire graph in RAM.
✓ When RAM is tight, use IVF-PQ instead — 32× compression with ~92% recall fits billion-scale datasets on a single machine.
HNSW dominates vector search, but production systems combine it with BM25 keyword search and cross-encoder reranking for best results. ⚡
↗ Keep Learning
Approximate Nearest Neighbor Search — Trading 1% Accuracy for 1000× Speed
A visual deep dive into ANN search. Why brute-force nearest neighbor fails at scale, how approximate methods achieve 99% recall with logarithmic query time, and the fundamental accuracy-speed tradeoff behind every vector search system.
IVF Index — Partitioning Vector Space with K-Means for Fast Search
A visual deep dive into the Inverted File Index (IVF). How k-means clustering partitions billion-scale vector collections into searchable regions, the nprobe recall-speed knob, and IVF complexity analysis.
Vector Databases — Search by Meaning, Not Keywords
A visual deep dive into vector databases. From embeddings to ANN search to HNSW — understand how AI-powered search finds what you actually mean, not just what you typed.
K-Nearest Neighbors — The Algorithm with No Training Step
A scroll-driven visual deep dive into KNN. Learn how the laziest algorithm in ML works, why distance metrics matter, and how the curse of dimensionality kills it.
Comments
No comments yet. Be the first!