Introduction 0%

Introduction

🎯 0/4 0%

Model Evaluation

Four cells.
Every classification metric.

Accuracy, precision, recall, F1, specificity, balanced accuracy, MCC — every single one of these metrics is derived from just four numbers arranged in a 2×2 table. Master the confusion matrix and you’ll never be confused by metrics again.

Four Quadrants

The 2×2 Table

The confusion matrix: actual labels (rows) vs predicted labels (columns). Every sample lands in exactly one cell.

↑ Answer the question above to continue ↑

A cancer screening test gives 100 results: TP=40, FN=10, FP=5, TN=45. How many actual cancer cases were there?

Derived Metrics

Every Metric From Four Numbers

Metrics derived from the confusion matrix

🎯 Accuracy

Overall correctness — the fraction of all predictions that were right. Sounds good, but it's misleading on imbalanced data where always predicting the majority class gets high accuracy.

(TP + TN) / (TP + TN + FP + FN)

🔍 Precision

Of all the positive PREDICTIONS, how many were actually correct? Measures how trustworthy your alarms are — high precision means few false alarms.

TP / (TP + FP)

🎣 Recall

Of all the actual POSITIVES, how many did the model find? Measures detection thoroughness — high recall means few missed cases.

TP / (TP + FN)

🛡️ Specificity

Of all the actual NEGATIVES, how many were correctly identified as negative? The 'recall for the negative class' — important when false positives are costly.

TN / (TN + FP)

⚖️ F1 Score

The harmonic mean of precision and recall, balancing both into a single number. Useful when you can't afford to ignore either false positives or false negatives.

2 × (Precision × Recall) / (Precision + Recall)

📊 MCC

Matthews Correlation Coefficient: the most balanced metric, using all four quadrants symmetrically. Ranges from −1 to +1, where 0 means no better than random — robust even on imbalanced data.

(TP×TN − FP×FN) / √((TP+FP)(TP+FN)(TN+FP)(TN+FN))

↑ Answer the question above to continue ↑

A model predicts 'not fraud' for ALL 10,000 transactions. 9,900 are legitimate, 100 are fraud. What's its accuracy and recall?

Multi-Class

Beyond 2×2: Multi-Class Confusion Matrices

For K classes, the confusion matrix is K×K. Each cell (i, j) shows how many samples from class i were predicted as class j.

Multi-class strategy: compute per-class metrics, then aggregate.

Averaging strategies

🔢 Micro-Average

Pool all TP, FP, and FN across classes globally, then compute the metric once. Treats every sample equally, giving more weight to larger classes.

⚖️ Macro-Average

Compute the metric independently for each class, then take the unweighted mean. Treats every CLASS equally, so poor performance on rare classes drags down the average.

📊 Weighted-Average

Like macro-averaging but each class's metric is weighted by its support (number of samples). A compromise between micro and macro that accounts for class frequency.

↑ Answer the question above to continue ↑

In a 3-class problem (A:1000, B:50, C:50), Class B has F1=0.2, the others have F1=0.95. Macro-F1 vs Micro-F1: which is lower?

Common Mistakes

Pitfalls to Avoid

Common pitfalls checklist:

Reporting accuracy on imbalanced data — use the confusion matrix to show the FULL picture
Ignoring off-diagonal patterns — in multi-class, which classes get confused with each other? The confusion matrix reveals systematic errors (e.g., always confusing “cat” with “dog”)
Forgetting about prevalence — precision depends on class balance; compare models on the SAME test set
Using a fixed threshold — the confusion matrix is threshold-dependent; consider sweeping thresholds (ROC curve)

In Practice

Practical Usage

from sklearn.metrics import confusion_matrix, classification_report
cm = confusion_matrix(y_true, y_pred)
print(classification_report(y_true, y_pred))

classification_report gives you per-class precision, recall, F1, support, plus macro/weighted averages — all derived from the confusion matrix. For visual inspection, use ConfusionMatrixDisplay.from_predictions().

↑ Answer the question above to continue ↑

You built a model to detect rare disease (1% prevalence). Your confusion matrix shows TP=8, FN=2, FP=50, TN=940. What's the precision and what does it mean practically?

🎓 What You Now Know

✓ The confusion matrix has 4 cells — TP, FN, FP, TN. Every sample goes in exactly one.

✓ Every metric derives from these 4 numbers — Accuracy, precision, recall, F1, specificity, MCC.

✓ Accuracy can be misleading — Always check the confusion matrix on imbalanced data.

✓ Multi-class: use one-vs-all — Then aggregate with micro, macro, or weighted averaging.

✓ MCC is the most balanced metric — Uses all 4 cells symmetrically. Report it.

The confusion matrix is the foundation of classification evaluation. Before computing ANY metric, print the confusion matrix. It tells you not just HOW MUCH the model is wrong, but HOW it’s wrong — and that’s what matters for improvement. 🔍

Confusion Matrix Deep Dive — What Your Model Gets Wrong and Why

Four cells.
Every classification metric.

The 2×2 Table

Every Metric From Four Numbers

Metrics derived from the confusion matrix

Beyond 2×2: Multi-Class Confusion Matrices

Averaging strategies

Pitfalls to Avoid

Practical Usage

🎓 What You Now Know

Comments

↗ Keep Learning

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

ROC Curves & AUC — Measuring Classifier Performance Visually

Logistic Regression — The Classifier That's Not Really Regression

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

Four cells. Every classification metric.

The 2×2 Table

Every Metric From Four Numbers

Metrics derived from the confusion matrix

Beyond 2×2: Multi-Class Confusion Matrices

Averaging strategies

Pitfalls to Avoid

Practical Usage

🎓 What You Now Know

Comments

↗ Keep Learning

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

ROC Curves & AUC — Measuring Classifier Performance Visually

Logistic Regression — The Classifier That's Not Really Regression

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

Four cells.
Every classification metric.