Introduction 0%

Introduction

🎯 0/4 0%

Probabilistic Classification

What if your assumption
is completely wrong
but it still works?

Naive Bayes assumes all features are independent. They never are. But this “stupid” assumption produces classifiers that are shockingly fast and surprisingly accurate.

Bayes' Theorem

The Foundation: Bayes’ Theorem

Bayes' theorem for classification

🔍 Likelihood

P(features|class) — how likely are these specific features if we assume a given class? For spam detection, it asks: among all spam emails, how often do we see words like 'free' and 'money'?

P(features | class)

📊 Prior

P(class) — how common is this class before observing any features? If 1% of emails are spam, the prior is 0.01. This anchors predictions to real-world base rates.

P(class)

📏 Evidence

P(features) — the overall probability of seeing these features across all classes. Serves as a normalizing constant so posteriors sum to 1. Often ignored since we only need to compare classes.

P(features)

↑ Answer the question above to continue ↑

In Bayes' theorem, P(spam | 'free money') is proportional to:

The Naive Assumption

The “Naive” Part

Why 'naive' matters mathematically

⚡ With Independence vs 🧮 Without Independence

⚡ With Independence

Assume features are independent given the class: the joint probability becomes a simple product of individual probabilities. Only n individual estimates needed — scales linearly with feature count.

P(x₁,...,xₙ|c) = P(x₁|c) · P(x₂|c) · ... · P(xₙ|c)

🧮 Without Independence

Must estimate the full joint distribution over all n features — exponentially many feature combinations to track. Impossible with realistic training data sizes, which is why the naive shortcut is essential.

P(x₁, x₂, ..., xₙ | c)

↑ Answer the question above to continue ↑

The 'naive' assumption in Naive Bayes is that features are independent given the class. This assumption is:

Variants

Gaussian vs Multinomial vs Bernoulli

Three flavors of Naive Bayes for three types of data

↑ Answer the question above to continue ↑

For classifying news articles into topics using word counts (bag of words), which Naive Bayes variant should you use?

Spam Filtering

The Killer App: Spam Detection

How Naive Bayes classifies email spam

Spam classification example

P(spam | 'free money') ∝ P('free'|spam) · P('money'|spam) · P(spam)

Multiply individual word likelihoods (naive assumption) times the prior

= 0.8 × 0.6 × 0.3 = 0.144

80% of spam contains 'free', 60% contains 'money', 30% of all email is spam

P(ham | 'free money') ∝ P('free'|ham) · P('money'|ham) · P(ham)

= 0.05 × 0.1 × 0.7 = 0.0035

5% of ham contains 'free', 10% contains 'money', 70% of all email is ham

0.144 ≫ 0.0035 → Predict SPAM ✓

Spam wins by a factor of 41x. Easy call!

↑ Answer the question above to continue ↑

A word has NEVER appeared in any spam email during training. When it appears in a new email, what happens to P(spam|email)?

Strengths & Limits

When Naive Bayes Shines (and When It Doesn’t)

Simple, fast, and good enough — the Naive Bayes value proposition

🎓 What You Now Know

✓ Bayes’ theorem flips the question — Instead of P(features|class), compute P(class|features) using the prior and likelihood.

✓ The “naive” assumption is always wrong — But classifiers only need correct rankings, not correct probabilities.

✓ Three variants for three data types — Gaussian (continuous), Multinomial (counts), Bernoulli (binary).

✓ Laplace smoothing prevents zero probabilities — One zero kills the entire product.

✓ Unbeatable for text classification baselines — Spam filtering, sentiment analysis, document classification.

Naive Bayes is a masterclass in the power of simplicity. It proves that a fast, interpretable model with bad assumptions can outperform a slow, complex model with good ones — especially when data is scarce. 🚀

Naive Bayes — Why 'Stupid' Assumptions Work Brilliantly

What if your assumption
is completely wrong
but it still works?

The Foundation: Bayes’ Theorem

Bayes' theorem for classification

The “Naive” Part

Why 'naive' matters mathematically

Gaussian vs Multinomial vs Bernoulli

The Killer App: Spam Detection

Spam classification example

When Naive Bayes Shines (and When It Doesn’t)

🎓 What You Now Know

Comments

↗ Keep Learning

Logistic Regression — The Classifier That's Not Really Regression

Decision Trees — How Machines Learn to Ask Questions

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

Logistic Regression — The Classifier That's Not Really Regression

What if your assumption is completely wrong but it still works?

The Foundation: Bayes’ Theorem

Bayes' theorem for classification

The “Naive” Part

Why 'naive' matters mathematically

Gaussian vs Multinomial vs Bernoulli

The Killer App: Spam Detection

Spam classification example

When Naive Bayes Shines (and When It Doesn’t)

🎓 What You Now Know

Comments

↗ Keep Learning

Logistic Regression — The Classifier That's Not Really Regression

Decision Trees — How Machines Learn to Ask Questions

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

Logistic Regression — The Classifier That's Not Really Regression

What if your assumption
is completely wrong
but it still works?