Introduction 0%

Introduction

🎯 0/3 0%

🔬

Same BERT model.
5-15% NDCG difference.

A bi-encoder encodes query and document separately, then compares vectors with cosine similarity. A cross-encoder concatenates query and document together, letting every query token attend to every document token. Same BERT, dramatically different accuracy — but the cross-encoder is 1000× more expensive at query time. This tradeoff defines modern search architecture.

↓ Scroll to understand the architecture that makes reranking possible

Architecture

Bi-Encoders vs. Cross-Encoders: The Critical Difference

Bi-encoders encode independently (fast, separate). Cross-encoders read query+doc together (slow, accurate).

↑ Answer the question above to continue ↑

A bi-encoder and cross-encoder use the SAME BERT model. Why does the cross-encoder produce more accurate relevance scores?

Cross-Encoder Math

Why cross-encoders are more accurate

Bi-encoder: score = cosine(BERT(q), BERT(d))

Query and doc NEVER see each other inside the model. The model compresses all meaning into a single vector BEFORE comparing.

Cross-encoder: score = sigmoid(BERT([CLS] q [SEP] d))

Query and doc are CONCATENATED and fed together. Every query token can attend to every doc token.

Example: query='apple stock price', doc='Apple Inc. reported strong quarterly earnings...'

Bi-encoder: encodes 'apple' without knowing you mean the COMPANY

The query vector for 'apple' is the same whether you mean fruit or company

Cross-encoder: sees 'apple' + 'stock' + 'Inc.' + 'earnings' together → knows it's about the company

Cross-attention lets query words disambiguate based on document content — this is WHY it's more accurate

↑ Answer the question above to continue ↑

A cross-encoder is far more accurate than a bi-encoder. Why not just use a cross-encoder for everything?

↑ Answer the question above to continue ↑

You want to build a semantic search system over 50M documents with the HIGHEST possible accuracy. Which architecture do you choose?

🎓 What You Now Know

✓ Bi-encoders encode query and doc separately — docs can be pre-computed offline, enabling ANN lookup at query time. Fast but lossy.

✓ Cross-encoders see query+doc together — full cross-attention produces 5-15% NDCG improvement. But requires a BERT pass per pair.

✓ The information bottleneck is the key — bi-encoders compress entire documents to 768 numbers. Cross-encoders have no such constraint.

✓ Cross-encoders can only rerank, never retrieve — 1B docs × 50ms = 1.5 years per query. Use them on the top 50-100 candidates only.

The two-stage retrieve + rerank pipeline combines the best of both: bi-encoder speed with cross-encoder accuracy. ⚡

Cross-Encoders vs Bi-Encoders — Why Accuracy Costs 1000× More Compute

Same BERT model.
5-15% NDCG difference.

Bi-Encoders vs. Cross-Encoders: The Critical Difference

Why cross-encoders are more accurate

🎓 What You Now Know

Comments

↗ Keep Learning

Search Reranking — The Two-Stage Pipeline That Powers Production Search

BM25 — The 30-Year-Old Algorithm That Still Wins at Search

Transformers — The Architecture That Changed AI

Approximate Nearest Neighbor Search — Trading 1% Accuracy for 1000× Speed

Search Reranking — The Two-Stage Pipeline That Powers Production Search

Same BERT model. 5-15% NDCG difference.

Bi-Encoders vs. Cross-Encoders: The Critical Difference

Why cross-encoders are more accurate

🎓 What You Now Know

Comments

↗ Keep Learning

Search Reranking — The Two-Stage Pipeline That Powers Production Search

BM25 — The 30-Year-Old Algorithm That Still Wins at Search

Transformers — The Architecture That Changed AI

Approximate Nearest Neighbor Search — Trading 1% Accuracy for 1000× Speed

Search Reranking — The Two-Stage Pipeline That Powers Production Search

Same BERT model.
5-15% NDCG difference.