Introduction 0%

Introduction

🎯 0/3 0%

Classification Metrics

99% accuracy.
Completely useless.

A model that predicts “no cancer” for every patient achieves 99% accuracy if only 1% have cancer. Accuracy hides what matters. Precision tells you how trustworthy positive predictions are. Recall tells you how many positives you actually catch. F1 balances both. Choosing the right metric is choosing what errors you can afford.

Accuracy Trap

The Accuracy Paradox

Accuracy and its limits

🎯 Accuracy Formula

Accuracy measures the fraction of all predictions that are correct — both positive and negative combined. It's the most intuitive metric but can be dangerously misleading on imbalanced datasets.

Accuracy = (TP + TN) / (TP + TN + FP + FN)

⚠️ The Imbalance Trap

On a 99/1 class split, predicting ALL samples as negative yields 99% accuracy while doing nothing useful. Accuracy hides the failure because it weights all classes equally.

🔍 Class-Focused Metrics

To evaluate performance on the minority (positive) class, you need precision (how trustworthy are positive predictions?) and recall (how many actual positives did we catch?).

↑ Answer the question above to continue ↑

A fraud detection model processes 10,000 transactions: 9,900 legitimate, 100 fraudulent. It predicts ALL transactions as legitimate. What's its accuracy?

Precision & Recall

The Two Questions Every Classifier Must Answer

Precision: of predicted positives, how many are correct? Recall: of actual positives, how many did we find?

↑ Answer the question above to continue ↑

A cancer screening test has 95% recall and 10% precision. In plain English, this means:

F1 Score

F1: Balancing Precision and Recall

The F1 score and its variants

⚖️ F1 Score

The harmonic mean of precision and recall. Unlike the arithmetic mean, it heavily penalizes extreme imbalances — if either metric is near zero, F1 collapses to near zero.

F1 = 2 × (Precision × Recall) / (Precision + Recall)

📉 Harmonic Mean Effect

With Precision=1.0 and Recall=0.0, the arithmetic mean would be 0.5 (looks okay), but F1 = 0 (reveals the failure). The harmonic mean forces BOTH metrics to be high.

🎛️ F-beta Generalization

F-beta lets you weight precision vs recall asymmetrically. β=1 is standard F1 (balanced), β=2 weights recall twice as much, β=0.5 weights precision twice as much.

F_β = (1 + β²) × (P × R) / (β²P + R)

💡 Choosing Your Beta

Use F2 when missing positives is dangerous (cancer screening — catch every case even at the cost of false alarms). Use F0.5 when false positives are costly (spam filtering — don't lose real emails).

↑ Answer the question above to continue ↑

Precision = 0.6, Recall = 0.6. What's the F1 score? Now: Precision = 0.9, Recall = 0.3. What's the F1?

The Tradeoff

You Can’t Maximize Both

Choosing Metrics

Which Metric for Which Problem?

Match your metric to your error cost structure

🎓 What You Now Know

✓ Accuracy lies on imbalanced data — Always check class distribution first.

✓ Precision = trust in positive predictions — Minimize false alarms (FP).

✓ Recall = coverage of actual positives — Minimize missed cases (FN).

✓ F1 = harmonic mean, punishes imbalance — Use F2 for recall-heavy, F0.5 for precision-heavy.

✓ Match metric to business cost — Missing cancer ≠ missing spam. Choose accordingly.

The metric you choose IS your optimization objective. A model optimized for accuracy on imbalanced data will learn to ignore the minority class. A model optimized for recall will find every positive at the cost of false alarms. There’s no “best” metric — only the right metric for YOUR problem. 📏

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

99% accuracy.
Completely useless.

The Accuracy Paradox

Accuracy and its limits

The Two Questions Every Classifier Must Answer

F1: Balancing Precision and Recall

The F1 score and its variants

You Can’t Maximize Both

Which Metric for Which Problem?

🎓 What You Now Know

Comments

↗ Keep Learning

Confusion Matrix Deep Dive — What Your Model Gets Wrong and Why

ROC Curves & AUC — Measuring Classifier Performance Visually

Logistic Regression — The Classifier That's Not Really Regression

Confusion Matrix Deep Dive — What Your Model Gets Wrong and Why

99% accuracy. Completely useless.

The Accuracy Paradox

Accuracy and its limits

The Two Questions Every Classifier Must Answer

F1: Balancing Precision and Recall

The F1 score and its variants

You Can’t Maximize Both

Which Metric for Which Problem?

🎓 What You Now Know

Comments

↗ Keep Learning

Confusion Matrix Deep Dive — What Your Model Gets Wrong and Why

ROC Curves & AUC — Measuring Classifier Performance Visually

Logistic Regression — The Classifier That's Not Really Regression

Confusion Matrix Deep Dive — What Your Model Gets Wrong and Why

99% accuracy.
Completely useless.