Polynomial Regression — When Lines Aren't Enough
A scroll-driven visual deep dive into polynomial regression. See why straight lines fail, how curves capture nonlinear patterns, and when you're overfitting vs underfitting.
Beyond Straight Lines
What happens when
reality isn’t linear?
House prices don’t grow linearly with size. Drug dosage effects aren’t straight lines. Temperature and ice cream sales? Definitely curved. Time to bend your regression line.
When Linear Regression Fails
Linear regression fits y = mx + b — a straight line. But many real-world relationships are:
- Quadratic — goes up then down (or vice versa)
- Exponential-ish — accelerates over time
- Saturating — grows fast then plateaus
- Cyclic — oscillates periodically
A straight line through curved data gives terrible predictions. The fix? Add powers of x as features.
Why does linear regression fail on curved data?
💡 What shape can y = mx + b produce? Only one...
Linear regression assumes y = mx + b — a constant slope. If the true relationship curves (goes up then down, accelerates, or saturates), a straight line will systematically miss the pattern, producing high bias. Polynomial regression fixes this by adding x², x³, etc. as features.
From Lines to Curves
Building the polynomial model
A straight line with one slope and one intercept. Can only model constant-rate relationships — no curves, peaks, or valleys.
y = w₁x + w₀ A parabola that can model one peak or valley. Adding x² lets the model capture acceleration or deceleration in the data.
y = w₂x² + w₁x + w₀ An S-curve with up to two turning points. Captures more complex patterns like initial growth, plateau, then decline.
y = w₃x³ + w₂x² + w₁x + w₀ The general polynomial with d+1 coefficients. Higher degree means more flexibility, but also more risk of overfitting to noise rather than learning the true signal.
y = w_d·xᵈ + ... + w₁x + w₀ Fitting still uses least squares
Feature matrix X = [1, x, x², ..., xᵈ] Loss = Σ(yᵢ - ŷᵢ)² w* = (XᵀX)⁻¹Xᵀy A polynomial regression model uses y = w₃x³ + w₂x² + w₁x + w₀. Is this model linear or nonlinear?
💡 Look at how the w's appear in the equation — are any of them squared or multiplied together?
This is a crucial distinction. The model is nonlinear IN x (it produces curves). But it's LINEAR IN THE PARAMETERS (w₀, w₁, w₂, w₃). Each parameter appears only once, multiplied by a fixed transformation of x. This means we can still use the normal equation (XᵀX)⁻¹Xᵀy — we just build X with columns [1, x, x², x³].
Degree 2 vs 5 vs 20 — What Happens?
How to pick the right degree
The validation error curve typically looks like a U-shape:
- Low degree → high bias → high training AND validation error
- Right degree → low bias, low variance → low validation error
- High degree → low training error but HIGH validation error → overfitting
You fit a degree-15 polynomial to 10 data points. Training MSE is nearly zero. What's likely happening?
💡 If you have 16 knobs to adjust and only 10 constraints...
With 16 parameters (degree 15 + intercept) and only 10 data points, the model has MORE parameters than data. It can pass through every single point exactly (zero training error) but the wild oscillations between points mean terrible predictions on new data. This is classic overfitting. Rule of thumb: keep the number of parameters well below the number of data points.
The Overfitting Problem
What overfitting looks like mathematically
The coefficient explosion
Degree 2: y = 0.5x² - 2x + 1 Degree 15: y = 847x¹⁵ - 12340x¹⁴ + ... Solution: penalize large coefficients You're choosing between a degree-2 and degree-8 polynomial. Both have similar validation error. Which should you prefer?
💡 Think about what happens when the data distribution shifts slightly...
When two models perform equally well, always prefer the simpler one. The degree-2 model uses only 3 parameters vs 9 for degree-8. It's more interpretable, more robust to distribution shifts, and less likely to break on edge cases. This principle is called Occam's razor (or the principle of parsimony) and it's fundamental to all of ML.
Polynomial Regression in the Real World
You have 3 input features and want to use degree-4 polynomial regression. How many terms will the model need (approximately)?
💡 It's not a simple multiplication — you need to include interaction terms like x₁²x₂...
With d features and degree p, the number of polynomial terms is C(d+p, p) = (d+p)!/(d!p!) = 7!/(3!4!) = 35. This includes all cross-terms like x₁²x₂x₃. With 10 features and degree 4, it's C(14,4) = 1001 terms! This combinatorial explosion is why polynomial regression is rarely used beyond 2-3 features — and why methods like Random Forests and neural nets dominate in high dimensions.
🎓 What You Now Know
✓ Linear regression fails on curved data — It can only model constant-rate relationships.
✓ Polynomial regression adds x², x³, … as features — It’s still linear in the weights, so the normal equation works.
✓ Higher degree ≠ better — Too many degrees leads to overfitting (memorizing noise).
✓ Use cross-validation to choose degree — Plot train vs validation error and pick the sweet spot.
✓ Polynomial regression doesn’t scale to many features — The number of terms explodes combinatorially.
Polynomial regression is the bridge between simple linear models and complex nonlinear ones. It teaches you overfitting, model selection, and the bias-variance tradeoff — the three most important concepts in all of ML. 🚀
↗ Keep Learning
Linear Regression — The Foundation of Machine Learning
A scroll-driven visual deep dive into linear regression. From data points to loss functions to gradient descent — understand the building block behind all of ML.
Ridge & Lasso — Taming Overfitting with Regularization
A scroll-driven visual deep dive into Ridge and Lasso regression. Learn why models overfit, how penalizing large weights fixes it, and why Lasso kills features.
Bias-Variance Tradeoff — The Most Important Concept in ML
A scroll-driven visual deep dive into the bias-variance tradeoff. Learn why every model makes errors, how underfitting and overfitting emerge, and how to balance them.
Comments
No comments yet. Be the first!