All articles
· 13 min deep-divemachine-learningmetrics
Article 1 in your session

MSE, MAE, R-Squared and Beyond — Regression Metrics That Actually Matter

A scroll-driven deep dive into regression metrics. Understand MSE, RMSE, MAE, MAPE, R-squared, and Adjusted R-squared — when to use each, their gotchas, and how to report results properly.

Introduction 0%
Introduction
🎯 0/4 0%

Model Evaluation

How wrong is your model?
Let’s quantify it.

Classification has accuracy. Regression has a zoo of metrics — MSE, RMSE, MAE, MAPE, R², Adjusted R². Each answers a slightly different question about how far off your predictions are. Pick the wrong one and you’ll optimize for the wrong thing.

Residuals

It All Starts With Residuals

Definition of a residual

📏 Residual

The difference between what actually happened and what the model predicted. It's the fundamental unit of error in regression — every metric is a function of residuals.

eᵢ = yᵢ − ŷᵢ
⬆️ Positive Residual

When the residual is positive, the actual value was higher than predicted — the model under-predicted and guessed too low.

eᵢ > 0 → under-prediction
⬇️ Negative Residual

When the residual is negative, the actual value was lower than predicted — the model over-predicted and guessed too high.

eᵢ < 0 → over-prediction
🎯 The Goal

Make residuals small AND unstructured. If there are patterns in your residuals, the model hasn't captured all the learnable signal in the data.

↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

Your model predicts house price = $300K. Actual price = $350K. What is the residual?

MSE & MAE

MSE, RMSE, and MAE

The core regression metrics

📏 MAE vs 📐 MSE vs 📊 RMSE vs 💡 Key Relationship
📏 MAE

Mean Absolute Error — the average absolute residual, reported in original units (e.g., dollars). Treats all errors equally and is robust to outliers, making it the default choice.

MAE = (1/n) Σ|yᵢ − ŷᵢ|
📐 MSE

Mean Squared Error — the average squared residual. By squaring, it penalizes large errors much more heavily than small ones. A few big misses will dominate this metric.

MSE = (1/n) Σ(yᵢ − ŷᵢ)²
📊 RMSE

Root Mean Squared Error — the square root of MSE, bringing it back to original units. Still sensitive to outliers like MSE, but easier to interpret since the units match your target variable.

RMSE = √MSE
💡 Key Relationship

MSE is always greater than or equal to MAE squared (Jensen's inequality). This means squaring amplifies large errors — a few big misses inflate MSE far more than MAE.

MSE ≥ MAE²
No outliersErrors: 1, 2, 1, 3, 2MAE = 1.8MSE = 3.8RMSE = 1.95MAE ≈ RMSEOne outlierErrors: 1, 2, 1, 3, 20MAE = 5.4 (3× ↑)MSE = 83.0 (22× ↑!)RMSE = 9.11 (5× ↑)MSE EXPLODES with outliers
How MSE and MAE react to outliers. MSE amplifies large errors quadratically.
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You're predicting delivery times. Most are off by 5-10 min, but occasionally you're off by 2 hours. Which metric should you optimize?

R²: The Proportion of Explained Variance

R-squared and Adjusted R-squared

1
SS_res = Σ(y_i − ŷ_i)²
Residual sum of squares — total squared error of the MODEL
2
SS_tot = Σ(y_i − ȳ)²
Total sum of squares — total squared error of the MEAN baseline
3
R² = 1 − SS_res / SS_tot
Proportion of variance explained by the model vs. predicting the mean
4
R² = 1.0 → perfect predictions (SS_res = 0)
Model explains ALL variance
5
R² = 0.0 → model = predicting the mean
Model explains NO variance beyond the baseline
6
R² < 0 → model is WORSE than the mean!
Yes, R² can be negative. The model is actively harmful.
7
Adjusted R² = 1 − [(1−R²)(n−1)] / [n−p−1]
Penalizes adding features. p = number of features, n = samples.
↑ Answer the question above to continue ↑
🔴 Challenge Knowledge Check

Model A has R²=0.85 with 3 features. Model B has R²=0.86 with 50 features. Which is likely better?

MAPE & More

MAPE and Scale-Independent Metrics

Mean Absolute Percentage Error

📊 MAPE

Average percentage error — scale-independent, so you can compare across different targets. Great for saying 'we're off by 5%' regardless of the scale of the values.

MAPE = (100/n) Σ |yᵢ − ŷᵢ| / |yᵢ|
⚠️ MAPE Pitfall

Undefined when any actual value is zero (division by zero). Also asymmetric: over-predictions are bounded at 100% error, but under-predictions can be unbounded.

⚖️ sMAPE

Symmetric MAPE fixes some of MAPE's issues by dividing by the average of actual and predicted values. Bounded between 0–200%, making it more stable and symmetric across over/under-predictions.

sMAPE = (100/n) Σ |yᵢ − ŷᵢ| / ((|yᵢ| + |ŷᵢ|)/2)
Do you have outliers? Use MAE or Huber loss Need to punish big errors? Use MSE or RMSE Need scale- independence? Use MAPE or sMAPE
Decision tree for choosing a regression metric.
Choosing Metrics

Practical Decision Guide

↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You're predicting quarterly revenue for companies ranging from $1M to $1B. An MAE of $5M means very different things for small vs. large companies. Which metric handles this?

🎓 What You Now Know

Residual = actual − predicted — Every regression metric is a function of residuals.

MAE = robust average error — Use by default. Resistant to outliers.

MSE/RMSE = punish large errors — Use when big misses are costly. Sensitive to outliers.

R² = variance explained vs. mean baseline — Adjusted R² for fair model comparison.

MAPE = scale-independent percentage error — Great for cross-scale comparison, but breaks at zero.

No single metric tells the whole story. Report MAE + R² at minimum. Add RMSE if outlier sensitivity matters, MAPE if scale-independence matters. And always compare against the trivial baseline — if predicting the mean does 95% as well, your model isn’t adding much value. 📊

Keep Learning