All articles
· 16 min deep-diveNLPsequence-labelingNER
Article 1 in your session

Named Entity Recognition — Teaching Machines to Find Names in Text

A scroll-driven visual deep dive into Named Entity Recognition (NER). From rule-based to CRF to transformer-based approaches — learn how search engines and email services extract people, places, companies, and dates from unstructured text.

Introduction 0%
Introduction
🎯 0/4 0%

📍

Your inbox knows
who, what, where, and when.

“Meeting with Dr. Sarah Chen at Google HQ in Mountain View on March 15th at 2pm.” Gmail extracts every entity — person, organization, location, date — and auto-creates a calendar event. That’s Named Entity Recognition: finding structured facts in unstructured text.

↓ Scroll to learn how machines find names in a sea of words

Entity Types

What Are Named Entities?

👤 PERSON (PER)Elon Musk, Dr. Sarah Chen, Einstein🏢 ORGANIZATION (ORG)Google, United Nations, Stanford📍 LOCATION (LOC)New York, Pacific Ocean, Everest📅 DATE / TIMEMarch 15th, 2025, next Tuesday, 2pmExtended entity types (domain-specific)💰 MONEY: $14.5 billion | 📧 EMAIL: john@gmail.com | 📞 PHONE: (555) 123-4567🔬 MEDICAL: aspirin, diabetes | 📜 LEGAL: Section 10b, GDPR | 💻 CODE: Python 3.12, GPT-4Biomedical NER alone has 100+ entity types: proteins, genes, diseases, drugs, dosages…
Standard NER identifies 4+ entity types — each highlighted differently in the text
Approaches

Three Eras of NER

Rule-Based1990s–early 2000s• Regex patterns• Gazetteers (name lists)• POS tag rules✓ Interpretable✓ No training data✓ Precise for known patterns✗ Brittle, high maintenance✗ Can’t handle unseen entitiesStatistical (CRF)2000s–2015• Conditional Random Fields• Hand-crafted features• Joint label prediction✓ Models label dependencies✓ Good with less data✓ Well-understood theory✗ Feature engineering heavy✗ Linear → can’t model complexNeural (BERT)2018–present• BERT + token classifier• Auto-learns features• Contextual embeddings✓ State-of-the-art accuracy✓ No feature engineering✓ Transfer learning✗ Needs GPU + data✗ Expensive inference
NER has evolved from handcrafted rules to neural sequence labeling
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

A rule-based NER system uses capitalization to find names: 'Capitalize words after periods are likely names.' It processes: 'I love Paris. Great city.' What happens?

BIO Tagging

The BIO Tagging Scheme

Sentence: “Tim Cook works at Apple Inc in San Francisco”TimB-PERCookI-PERworksOatOAppleB-ORGIncI-ORGinOSanB-LOCFranciscoI-LOCWhy BIO? Because entities span multiple tokens!”San Francisco” = 2 tokens, 1 entity. B marks the START, I marks the CONTINUATION.Without B/I distinction, “Tim Cook works at Apple” → are “Tim” and “Cook” 1 entity or 2?
BIO encodes entity boundaries: B=Beginning of entity, I=Inside entity, O=Outside (not an entity)
CRF Models

CRF: Why Label Dependencies Matter

CRF: jointly optimizing the full label sequence

1
Independent model: P(tag₁) × P(tag₂) × ... × P(tagₙ)
Each tag predicted independently — can produce invalid sequences like I-PER after B-LOC
2
CRF model: P(tag₁, tag₂, ..., tagₙ | x₁, x₂, ..., xₙ)
Predicts the ENTIRE tag sequence jointly — guarantees valid sequences
3
Score(y|x) = Σᵢ [emission(yᵢ,xᵢ) + transition(yᵢ₋₁,yᵢ)]
emission = how well tag fits this word; transition = how well tag follows previous tag
4
transition(B-PER → I-PER) = HIGH
Person names often span multiple words
5
transition(I-PER → I-LOC) = IMPOSSIBLE
A location can't continue a person entity — CRF prevents this
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

A simple NER model (without CRF) labels 'New York City' as: New=B-LOC, York=B-LOC, City=O. What went wrong and how does CRF fix it?

Neural NER

Modern NER: BERT + CRF

📝 Tokens Tim Cook ... 🧠 BERT Embeddings 📊 Linear Projection 🔗 CRF Transitions 🏷️ BIO Tags B-PER I-PER
State-of-the-art NER: BERT contextual embeddings + CRF sequence labeling
↑ Answer the question above to continue ↑
🔴 Challenge Knowledge Check

The sentence is: 'I ate an apple while reading about Apple on my Apple Watch.' A BERT-based NER model should:

Applications

Real-World NER Applications

📧 Email (Gmail, Outlook)• Auto-create calendar events• Package tracking (dates, numbers)• Contact extraction from signatures🔍 Search (Google, Bing)• Knowledge Graph population• Query entity recognition• Rich snippet extraction💰 Finance• Extract deal parties from contracts• News-based trading signals• Regulatory entity matching🏥 Healthcare• Drug-disease-symptom extraction• Clinical note structuring• Adverse event detection
NER powers extraction features across email, search, finance, and healthcare
↑ Answer the question above to continue ↑
🔴 Challenge Knowledge Check

Your NER system tags 'Apple' as ORG and 'Cupertino' as LOC correctly. But for 'Jordan played basketball,' it tags Jordan as LOC instead of PER. What's the core problem and fix?

🎓 What You Now Know

NER is sequence labeling, not classification — every token gets a label (B-PER, I-PER, O, etc.), making it fundamentally harder than document-level tasks.

BIO tagging handles multi-word entities — B marks where an entity starts, I marks continuation, O marks non-entities.

CRFs model label dependencies — preventing invalid sequences like I-PER following B-LOC, and helping multi-word entities stay together.

BERT revolutionized NER — contextual embeddings disambiguate “apple” (fruit) vs “Apple” (company) based on surrounding words, eliminating the need for hand-crafted features.

NER powers real products — Gmail auto-creating calendar events, Google’s Knowledge Graph, financial contract analysis, and clinical text mining all rely on NER.

Named Entity Recognition is how machines extract structure from chaos. Every email auto-categorized, every knowledge panel shown, every medical record digitized — NER is working quietly behind the scenes. 📍

Keep Learning