Vector Databases — Search by Meaning, Not Keywords
A visual deep dive into vector databases. From embeddings to ANN search to HNSW — understand how AI-powered search finds what you actually mean, not just what you typed.
🧠
Google doesn’t match words.
It matches meaning.
Search “affordable running shoes” and find results for “budget jogging sneakers”
— different words, same meaning. This is powered by vector search:
converting everything into numbers and finding what’s mathematically close.
↓ Scroll to understand the database behind RAG, semantic search, and AI features
Why Keywords Fail
Traditional databases search by exact matching. But humans don’t think in keywords — we think in concepts.
Embeddings: Turning Everything into Numbers
An embedding is a vector — a list of numbers — that captures the meaning of something. Text, images, audio — anything can be embedded.
How embeddings are created
Any piece of text — a word, sentence, paragraph, or entire document — serves as input. The embedding model doesn't care about length; it converts meaning into numbers.
A neural network (like OpenAI's text-embedding-3-small or Sentence-BERT) trained on billions of text pairs. It learns to map semantically similar text to nearby points in vector space.
A fixed-size array of 768–1536 floating-point numbers encoding the text's meaning. Similar concepts produce similar vectors: 'dog' [0.82, 0.15, ...] and 'puppy' [0.80, 0.18, ...] are nearly identical, while 'Python' [-0.21, 0.76, ...] is completely different.
text → [0.12, -0.45, 0.78, ..., 0.33] Why do embeddings use 768-1536 dimensions instead of just 2 or 3?
💡 How many independent qualities describe a piece of text? Way more than 2...
Language has thousands of independent axes of meaning: sentiment, topic, formality, specificity, time reference, etc. Just 2 dimensions can't separate all these concepts. With 768+ dimensions, the embedding can capture subtle distinctions — like the difference between 'bank' (river) and 'bank' (finance). More dimensions = more expressive power.
Measuring Similarity: Cosine Distance
Once everything is a vector, finding similar items is just measuring the angle between vectors.
Cosine Similarity — the standard metric
Cosine similarity measures the angle between two vectors by dividing their dot product by the product of their magnitudes. It captures direction (meaning) rather than magnitude (length), so a long document and a short tweet about the same topic score high.
cos(θ) = (A · B) / (||A|| × ||B||) The vectors point in exactly the same direction — the texts share the same meaning. 'Happy dog playing' and 'Joyful puppy having fun' would score near 1.0.
The vectors are perpendicular — the texts have no semantic relationship at all. 'Dog' and 'Economics' are neither similar nor opposite, just completely unrelated topics.
The vectors point in opposite directions — the texts convey opposite meanings. This is rare in practice because most embedding spaces don't produce true semantic opposites as negative cosine values.
Vectors A=[1,0] and B=[0,1] have cosine similarity of 0. What does this mean?
💡 What's the geometric meaning of cos(90°) = 0?
Cosine similarity of 0 means the vectors are perpendicular (90° angle). They share no semantic similarity. Think of it like 'dog' and 'economics' — not opposites, just completely unrelated topics. A cosine of -1 would mean opposites, and 1 would mean identical.
The Scale Problem: You Can’t Compare Everything
You have 100 million vectors. A user queries with a new vector. Finding the most similar one means computing 100 million cosine similarities — way too slow.
Brute force is O(n) — that doesn't scale
Brute force: compare query to ALL n vectors At ~10 GFLOPS: 15 seconds per query Solution: Approximate Nearest Neighbor (ANN) search Trade-off: 95-99% accuracy for 1000x speed HNSW: The Algorithm Behind Vector Search
Hierarchical Navigable Small World — the most popular ANN algorithm. It builds a multi-layer graph where each layer is a “highway system” for fast navigation.
HNSW has multiple layers. What's the purpose of the upper (sparse) layers?
💡 Think of a real-world analogy: interstate highways vs. local streets...
The upper layers act like a highway system. They have few nodes with long-range connections, allowing the search algorithm to quickly jump to the right 'neighborhood' of the query. Then the search descends to lower layers (with more nodes and shorter connections) for precise, local search. This is why HNSW achieves O(log n) search time — the hierarchical structure halves the search space at each layer.
Vector Databases in the Real World
In a RAG system, why do you give retrieved documents to the LLM instead of just querying the LLM directly?
💡 What happens when you ask an LLM about your company's specific refund policy?
LLMs have a knowledge cutoff date and don't know about your private data. Without retrieval, they'll either say 'I don't know' or (worse) hallucinate a plausible but wrong answer. RAG gives the LLM actual source documents to reference, dramatically reducing hallucination and enabling it to answer about your specific data (company docs, recent events, etc.).
You need to add semantic search to an existing PostgreSQL app with 1M documents. Which approach is most practical?
💡 What's the simplest solution that doesn't require adding new infrastructure?
pgvector is a PostgreSQL extension that adds vector column types and ANN search indexes directly to your existing database. No new infrastructure, no data sync, no new ops burden. For 1M documents, pgvector handles this easily with HNSW indexes. A dedicated vector DB (Pinecone, Qdrant) makes sense at 10M+ vectors or when you need specialized features.
🎓 What You Now Know
✓ Embeddings turn meaning into numbers — Neural networks convert text/images into high-dimensional vectors where similar concepts cluster together.
✓ Cosine similarity measures semantic closeness — It captures direction (meaning) rather than magnitude (length).
✓ ANN search trades accuracy for speed — Checking every vector is O(n) and too slow. ANN algorithms give ~99% accuracy at 1000x the speed.
✓ HNSW is the dominant algorithm — Multi-layer graph with fast long-range navigation at the top and precise local search at the bottom.
✓ RAG is the killer app — Retrieve relevant docs from a vector DB, feed them to an LLM, get grounded answers with less hallucination.
Vector databases are the infrastructure layer powering the AI revolution. Every chatbot, every AI search, every recommendation engine is built on these concepts. They’re to the AI era what SQL databases were to the web era. 🚀
📄 Efficient and Robust Approximate Nearest Neighbor using HNSW Graphs (Malkov & Yashunin, 2016)
📄 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020)
↗ Keep Learning
Database Sharding — Scaling Beyond One Machine
A visual deep dive into database sharding. From single-server bottlenecks to consistent hashing — understand how companies scale their databases to billions of rows.
K-Nearest Neighbors — The Algorithm with No Training Step
A scroll-driven visual deep dive into KNN. Learn how the laziest algorithm in ML works, why distance metrics matter, and how the curse of dimensionality kills it.
PCA — Compressing Reality Without Losing the Plot
A scroll-driven visual deep dive into Principal Component Analysis. Learn eigenvectors, variance maximization, dimensionality reduction, and when PCA transforms your data — and when it doesn't.
Comments
No comments yet. Be the first!