All articles
· 17 min deep-divesystemsAI
Article 1 in your session

Vector Databases — Search by Meaning, Not Keywords

A visual deep dive into vector databases. From embeddings to ANN search to HNSW — understand how AI-powered search finds what you actually mean, not just what you typed.

Introduction 0%
Introduction
🎯 0/5 0%

🧠

Google doesn’t match words.
It matches meaning.

Search “affordable running shoes” and find results for “budget jogging sneakers”
— different words, same meaning. This is powered by vector search:
converting everything into numbers and finding what’s mathematically close.

↓ Scroll to understand the database behind RAG, semantic search, and AI features

The Problem

Why Keywords Fail

Traditional databases search by exact matching. But humans don’t think in keywords — we think in concepts.

Keyword Search (SQL)Search: “headache remedies”✓ “Top 10 headache remedies”✗ “How to treat a migraine” — MISS✗ “Pain relief for head” — MISS✗ “Aspirin vs ibuprofen” — MISSOnly finds exact word matchesVector SearchSearch: “headache remedies”✓ “Top 10 headache remedies” (0.95)✓ “How to treat a migraine” (0.89)✓ “Pain relief for head” (0.87)✓ “Aspirin vs ibuprofen” (0.82)Finds meaning, not just words
Keyword search misses semantically identical queries
Embeddings

Embeddings: Turning Everything into Numbers

An embedding is a vector — a list of numbers — that captures the meaning of something. Text, images, audio — anything can be embedded.

Dimension 1 →Dimension 2 →(Simplified to 2D — real embeddings have 768-1536 dimensions)dogcatpuppyAnimals 🐾PythonJavaScriptcodeProgramming 💻pizzasushiburgerFood 🍕far apart = differentclose = similar
Similar concepts cluster together in embedding space

How embeddings are created

📝 Input Text 🧠 Embedding Model 📊 Output Vector
📝 Input Text

Any piece of text — a word, sentence, paragraph, or entire document — serves as input. The embedding model doesn't care about length; it converts meaning into numbers.

🧠 Embedding Model

A neural network (like OpenAI's text-embedding-3-small or Sentence-BERT) trained on billions of text pairs. It learns to map semantically similar text to nearby points in vector space.

📊 Output Vector

A fixed-size array of 768–1536 floating-point numbers encoding the text's meaning. Similar concepts produce similar vectors: 'dog' [0.82, 0.15, ...] and 'puppy' [0.80, 0.18, ...] are nearly identical, while 'Python' [-0.21, 0.76, ...] is completely different.

text → [0.12, -0.45, 0.78, ..., 0.33]
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

Why do embeddings use 768-1536 dimensions instead of just 2 or 3?

Similarity

Measuring Similarity: Cosine Distance

Once everything is a vector, finding similar items is just measuring the angle between vectors.

Cosine Similarity — the standard metric

📐 The Formula

Cosine similarity measures the angle between two vectors by dividing their dot product by the product of their magnitudes. It captures direction (meaning) rather than magnitude (length), so a long document and a short tweet about the same topic score high.

cos(θ) = (A · B) / (||A|| × ||B||)
🎯 Score = 1.0 (Identical)

The vectors point in exactly the same direction — the texts share the same meaning. 'Happy dog playing' and 'Joyful puppy having fun' would score near 1.0.

↔️ Score = 0.0 (Unrelated)

The vectors are perpendicular — the texts have no semantic relationship at all. 'Dog' and 'Economics' are neither similar nor opposite, just completely unrelated topics.

🔄 Score = -1.0 (Opposite)

The vectors point in opposite directions — the texts convey opposite meanings. This is rare in practice because most embedding spaces don't produce true semantic opposites as negative cosine values.

origin”dog""puppy""Python”θ = 8° → cos = 0.99Nearly parallel = very similar!θ = 70° → cos = 0.34Wide angle = very different!
Cosine similarity measures the angle, not the magnitude
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

Vectors A=[1,0] and B=[0,1] have cosine similarity of 0. What does this mean?

ANN Search

The Scale Problem: You Can’t Compare Everything

You have 100 million vectors. A user queries with a new vector. Finding the most similar one means computing 100 million cosine similarities — way too slow.

Brute force is O(n) — that doesn't scale

1
Brute force: compare query to ALL n vectors
100M vectors × 1536 dimensions = ~150 billion floating point operations per query
2
At ~10 GFLOPS: 15 seconds per query
No one waits 15 seconds for search results
3
Solution: Approximate Nearest Neighbor (ANN) search
Find 'close enough' results in milliseconds by not checking everything
4
Trade-off: 95-99% accuracy for 1000x speed
You might miss the absolute closest vector, but you'll find one that's nearly as close
HNSW

Hierarchical Navigable Small World — the most popular ANN algorithm. It builds a multi-layer graph where each layer is a “highway system” for fast navigation.

Layer 2(Express)AGLayer 1(Local)ACEGLayer 0(Dense)ABCDEFGHISearch: “Find nearest to query F”Jump A→E (closest to F)↓ Found F!Only checked 4 nodes instead of 9 — O(log n) instead of O(n)!
HNSW: start at the top layer (sparse, fast jumps) and descend to the bottom layer (dense, precise)
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

HNSW has multiple layers. What's the purpose of the upper (sparse) layers?

Real World

Vector Databases in the Real World

User Query”What’s ourrefund policy?”Vector DBEmbed query→ find similardocs (top 5)Contextdoc: refund_policy.pdfdoc: returns_faq.mddoc: terms_of_service.pdf↑ relevant chunks🤖 LLMQuery +context →answerRAG = “Ground the LLM in your data”Reduces hallucination by giving the LLM actual source documents to reference
RAG (Retrieval-Augmented Generation): the #1 use case for vector databases today
PineconeFully managedMost popularWeaviateOpen sourceHybrid searchQdrantRust-basedFastest perfpgvectorPostgreSQL ext.Add vectors to SQLUse cases: RAG chatbots, semantic search, recommendation engines,image similarity, anomaly detection, deduplication, personalization
The vector database landscape
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

In a RAG system, why do you give retrieved documents to the LLM instead of just querying the LLM directly?

↑ Answer the question above to continue ↑
🔴 Challenge Knowledge Check

You need to add semantic search to an existing PostgreSQL app with 1M documents. Which approach is most practical?

🎓 What You Now Know

Embeddings turn meaning into numbers — Neural networks convert text/images into high-dimensional vectors where similar concepts cluster together.

Cosine similarity measures semantic closeness — It captures direction (meaning) rather than magnitude (length).

ANN search trades accuracy for speed — Checking every vector is O(n) and too slow. ANN algorithms give ~99% accuracy at 1000x the speed.

HNSW is the dominant algorithm — Multi-layer graph with fast long-range navigation at the top and precise local search at the bottom.

RAG is the killer app — Retrieve relevant docs from a vector DB, feed them to an LLM, get grounded answers with less hallucination.

Vector databases are the infrastructure layer powering the AI revolution. Every chatbot, every AI search, every recommendation engine is built on these concepts. They’re to the AI era what SQL databases were to the web era. 🚀

📄 Efficient and Robust Approximate Nearest Neighbor using HNSW Graphs (Malkov & Yashunin, 2016)


📄 Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020)

Keep Learning