Introduction 0%

Introduction

🎯 0/3 0%

Model Evaluation

Your test accuracy is a lie.
Cross-validation tells the truth.

One train/test split gives you one number. That number could be lucky or unlucky. Cross-validation gives you a distribution — mean accuracy AND confidence interval. Combined with smart hyperparameter search, it’s how professionals evaluate and tune models.

Hold-Out Problem

The Problem with a Single Split

What could go wrong?

🍀 Lucky Split

Easy test examples land in the held-out set by chance, inflating the accuracy to 92%. You'd think your model is great — but it was just a favorable draw.

Split 1: test accuracy = 92%

🎲 Unlucky Split

Hard examples concentrate in the test set, making accuracy drop to 81%. Same model, same data — wildly different result due to the random partition.

Split 2: test accuracy = 81%

📊 Average Split

A more typical partition gives 86% — somewhere between lucky and unlucky, but you still wouldn't know that without trying other splits.

Split 3: test accuracy = 86%

🎯 True Performance

Only by evaluating on ALL splits can you estimate the mean (≈86%) AND the uncertainty (±5%). A single number without a confidence interval is almost meaningless.

True performance ≈ 86% ± 5%

↑ Answer the question above to continue ↑

You train a model once, test it once, and get 94% accuracy. You report '94% accuracy' in your paper. What's wrong?

K-Fold CV

K-Fold Cross-Validation

5-Fold CV: each fold serves as test once, train 4 times. Every point gets tested!

K-Fold CV math

📊 CV Score

The cross-validation score is the average of scores from each fold. This gives a more robust estimate than any single train/test split.

CV score = (1/K) Σᵢ score(foldᵢ)

⚙️ Choosing K

K=5 is the most common default: fast and 80% training data per fold. K=10 gives lower variance. LOOCV (K=N) uses maximum training data but is expensive and high-variance.

✅ Complete Coverage

Every data point appears in the test set exactly once across all folds. No data is wasted — you get a test prediction for every single sample.

↑ Answer the question above to continue ↑

In 5-fold CV, how much of the data does each model train on?

CV Variants

Choosing the Right CV Strategy

Different CV strategies for different data types

Grid Search

Hyperparameter Tuning: Finding the Best Settings

GridSearchCV: try every combination, evaluate each with K-fold CV

↑ Answer the question above to continue ↑

You use GridSearchCV to find the best hyperparameters and report the best CV score. Is there a subtle problem?

Smart Search

Beyond Grid Search

Grid vs Random vs Bayesian search strategies

🎓 What You Now Know

✓ Single splits are unreliable — Use K-fold CV (K=5 or 10) for robust estimates.

✓ Stratified for classification, time-series for temporal — Match CV to your data structure.

✓ Grid search is exhaustive but expensive — Use random search for 4+ hyperparameters.

✓ Always keep a final held-out test set — Never use it for tuning or selection.

✓ Bayesian optimization for expensive models — Learns which regions of hyperspace are promising.

Model evaluation isn’t glamorous, but it’s what separates rigorous ML from guesswork. A mediocre model properly evaluated is more trustworthy than a brilliant model tested on one lucky split. Build the habit: every result you report should come from cross-validation. 📊

📄 Random Search for Hyper-Parameter Optimization (Bergstra & Bengio, 2012)

Cross-Validation & Hyperparameter Tuning — How to Actually Evaluate Models

Your test accuracy is a lie.
Cross-validation tells the truth.

The Problem with a Single Split

What could go wrong?

K-Fold Cross-Validation

K-Fold CV math

Choosing the Right CV Strategy

Hyperparameter Tuning: Finding the Best Settings

Beyond Grid Search

🎓 What You Now Know

Comments

↗ Keep Learning

Bias-Variance Tradeoff — The Most Important Concept in ML

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

Bias-Variance Tradeoff — The Most Important Concept in ML

Your test accuracy is a lie. Cross-validation tells the truth.

The Problem with a Single Split

What could go wrong?

K-Fold Cross-Validation

K-Fold CV math

Choosing the Right CV Strategy

Hyperparameter Tuning: Finding the Best Settings

Beyond Grid Search

🎓 What You Now Know

Comments

↗ Keep Learning

Bias-Variance Tradeoff — The Most Important Concept in ML

Accuracy, Precision, Recall & F1 — Choosing the Right Metric

Bias-Variance Tradeoff — The Most Important Concept in ML

Your test accuracy is a lie.
Cross-validation tells the truth.