All articles
· 7 min deep-divesearchinformation-retrievalNLP
Article 1 in your session

BM25 — The 30-Year-Old Algorithm That Still Wins at Search

A visual deep dive into BM25 scoring. Understand every term in the formula — IDF, TF saturation via k₁, length normalization via b — and why BM25 still outperforms neural retrievers on specialized vocabularies.

Introduction 0%
Introduction
🎯 0/3 0%

📝

Published in 1994.
Still beating neural models.

BM25 (Best Match 25) is a bag-of-words scoring function that ranks documents by term frequency, inverse document frequency, and document length. It powers Elasticsearch, Solr, and Lucene. In the 2024 BEIR benchmark, BM25 outperformed fine-tuned dense retrieval models on 6 out of 18 datasets — especially on medical, legal, and scientific domains. The secret? Exact keyword matching wins when vocabulary is specialized.

↓ Scroll to understand every term in the BM25 formula

BM25 Formula

BM25: Every Term Explained

BM25 scoring function — every term explained

🔍 BM25 Score = 📐 Full Formula + 💎 IDF Weight + 📊 Term Frequency + 🎛️ k₁ — TF Saturation + 📏 b — Length Normalization + 📐 Length Ratio
📐 Full Formula

BM25 scores a document by summing over all query terms: each term's contribution is its IDF weight multiplied by a saturated, length-normalized term frequency.

BM25(q, d) = Σ IDF(t) × [tf×(k₁+1)] / [tf + k₁×(1−b+b×|d|/avgdl)]
💎 IDF Weight

Inverse Document Frequency gives rare terms higher weight. 'quantum' appears in few documents and gets a high IDF; 'the' appears everywhere and gets near-zero IDF.

IDF(t) = log[(N − df(t) + 0.5) / (df(t) + 0.5)]
📊 Term Frequency

Raw count of how many times a query term appears in the document. More occurrences signal higher relevance — but BM25 applies saturation so the 30th mention barely matters more than the 3rd.

🎛️ k₁ — TF Saturation

Controls how quickly additional term occurrences stop mattering. At k₁=0, frequency is ignored entirely. At k₁=∞, TF grows linearly. The typical k₁=1.2 means mentioning a term 3× helps a lot, but 30× barely adds more.

📏 b — Length Normalization

Controls whether longer documents are penalized. At b=0, document length is ignored. At b=1, fully normalized. The typical b=0.75 penalizes long docs (where terms appear more by chance) without being too aggressive.

📐 Length Ratio

The ratio |d|/avgdl compares this document's length to the corpus average. A document twice as long as average gets its TF discounted — occurrences in longer documents are more likely to happen by chance.

|d|/avgdl = document length / average document length
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

In the BM25 formula, what does the IDF component reward?

TF Saturation

TF Saturation: The Key Insight

Term Frequency (tf)BM25 score component0246810raw TF (k₁→∞)k₁ = 1.2k₁ = 0.5At k₁=1.2: tf=3 → score 2.1, tf=30 → score 2.610× more occurrences → only 24% more score!
k₁ controls TF saturation: how quickly additional word occurrences stop mattering
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You're building a search engine for medical research papers. A user queries 'BRCA1 p.V600E mutation pathogenicity'. Which retrieval method would you use as Stage 1?

Now that you’ve seen what BM25 does and why it works, let’s test your intuition about the edge cases. The k₁ parameter is the single knob that controls TF saturation — and pushing it to its extremes reveals exactly how flexible the formula really is.

↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You set k₁=0 in your BM25 configuration. What happens to the scoring behavior?

🎓 What You Now Know

BM25 = IDF × saturated TF with length normalization — a deceptively simple formula that’s been the backbone of search for 30 years.

k₁ controls TF saturation — at k₁=1.2, mentioning a term 3× is great but 30× is barely better. This prevents keyword stuffing.

b controls length normalization — at b=0.75, long documents are penalized because words appear more by chance in longer text.

BM25 beats neural models on specialized vocabularies — exact keyword matching wins when terms like “BRCA1 p.V600E” must be matched precisely.

Modern production search always includes BM25 alongside dense retrieval — the two approaches are complementary, not competing. ⚡

Keep Learning