Introduction 0%

Introduction

🎯 0/3 0%

📝

Published in 1994.
Still beating neural models.

BM25 (Best Match 25) is a bag-of-words scoring function that ranks documents by term frequency, inverse document frequency, and document length. It powers Elasticsearch, Solr, and Lucene. In the 2024 BEIR benchmark, BM25 outperformed fine-tuned dense retrieval models on 6 out of 18 datasets — especially on medical, legal, and scientific domains. The secret? Exact keyword matching wins when vocabulary is specialized.

↓ Scroll to understand every term in the BM25 formula

BM25 Formula

BM25: Every Term Explained

BM25 scoring function — every term explained

🔍 BM25 Score = 📐 Full Formula + 💎 IDF Weight + 📊 Term Frequency + 🎛️ k₁ — TF Saturation + 📏 b — Length Normalization + 📐 Length Ratio

📐 Full Formula

BM25 scores a document by summing over all query terms: each term's contribution is its IDF weight multiplied by a saturated, length-normalized term frequency.

BM25(q, d) = Σ IDF(t) × [tf×(k₁+1)] / [tf + k₁×(1−b+b×|d|/avgdl)]

💎 IDF Weight

Inverse Document Frequency gives rare terms higher weight. 'quantum' appears in few documents and gets a high IDF; 'the' appears everywhere and gets near-zero IDF.

IDF(t) = log[(N − df(t) + 0.5) / (df(t) + 0.5)]

📊 Term Frequency

Raw count of how many times a query term appears in the document. More occurrences signal higher relevance — but BM25 applies saturation so the 30th mention barely matters more than the 3rd.

🎛️ k₁ — TF Saturation

Controls how quickly additional term occurrences stop mattering. At k₁=0, frequency is ignored entirely. At k₁=∞, TF grows linearly. The typical k₁=1.2 means mentioning a term 3× helps a lot, but 30× barely adds more.

📏 b — Length Normalization

Controls whether longer documents are penalized. At b=0, document length is ignored. At b=1, fully normalized. The typical b=0.75 penalizes long docs (where terms appear more by chance) without being too aggressive.

📐 Length Ratio

The ratio |d|/avgdl compares this document's length to the corpus average. A document twice as long as average gets its TF discounted — occurrences in longer documents are more likely to happen by chance.

|d|/avgdl = document length / average document length

↑ Answer the question above to continue ↑

In the BM25 formula, what does the IDF component reward?

TF Saturation

TF Saturation: The Key Insight

k₁ controls TF saturation: how quickly additional word occurrences stop mattering

↑ Answer the question above to continue ↑

You're building a search engine for medical research papers. A user queries 'BRCA1 p.V600E mutation pathogenicity'. Which retrieval method would you use as Stage 1?

Now that you’ve seen what BM25 does and why it works, let’s test your intuition about the edge cases. The k₁ parameter is the single knob that controls TF saturation — and pushing it to its extremes reveals exactly how flexible the formula really is.

↑ Answer the question above to continue ↑

You set k₁=0 in your BM25 configuration. What happens to the scoring behavior?

🎓 What You Now Know

✓ BM25 = IDF × saturated TF with length normalization — a deceptively simple formula that’s been the backbone of search for 30 years.

✓ k₁ controls TF saturation — at k₁=1.2, mentioning a term 3× is great but 30× is barely better. This prevents keyword stuffing.

✓ b controls length normalization — at b=0.75, long documents are penalized because words appear more by chance in longer text.

✓ BM25 beats neural models on specialized vocabularies — exact keyword matching wins when terms like “BRCA1 p.V600E” must be matched precisely.

Modern production search always includes BM25 alongside dense retrieval — the two approaches are complementary, not competing. ⚡

BM25 — The 30-Year-Old Algorithm That Still Wins at Search

Published in 1994.
Still beating neural models.

BM25: Every Term Explained

BM25 scoring function — every term explained

TF Saturation: The Key Insight

🎓 What You Now Know

Comments

↗ Keep Learning

Cross-Encoders vs Bi-Encoders — Why Accuracy Costs 1000× More Compute

Search Reranking — The Two-Stage Pipeline That Powers Production Search

Approximate Nearest Neighbor Search — Trading 1% Accuracy for 1000× Speed

Naive Bayes — Why 'Stupid' Assumptions Work Brilliantly

Cross-Encoders vs Bi-Encoders — Why Accuracy Costs 1000× More Compute

Published in 1994. Still beating neural models.

BM25: Every Term Explained

BM25 scoring function — every term explained

TF Saturation: The Key Insight

🎓 What You Now Know

Comments

↗ Keep Learning

Cross-Encoders vs Bi-Encoders — Why Accuracy Costs 1000× More Compute

Search Reranking — The Two-Stage Pipeline That Powers Production Search

Approximate Nearest Neighbor Search — Trading 1% Accuracy for 1000× Speed

Naive Bayes — Why 'Stupid' Assumptions Work Brilliantly

Cross-Encoders vs Bi-Encoders — Why Accuracy Costs 1000× More Compute

Published in 1994.
Still beating neural models.