All articles
· 14 min deep-divemachine-learningclassification
Article 1 in your session

Logistic Regression — The Classifier That's Not Really Regression

A scroll-driven visual deep dive into logistic regression. Learn how a regression model becomes a classifier, why the sigmoid is the key, and how log-loss trains it.

Introduction 0%
Introduction
🎯 0/4 0%

Classification Fundamentals

How do you predict
yes or no?

Will this email be spam? Will this patient have diabetes? Will this user click? Regression gives numbers. Classification gives answers.

The Sigmoid

From Numbers to Probabilities

Logistic regression in three steps

1
z = w₁x₁ + w₂x₂ + ... + b
Step 1: Compute a linear score — same as linear regression
2
σ(z) = 1 / (1 + e⁻ᶻ)
Step 2: Squash through the sigmoid — output is now between 0 and 1
3
ŷ = 1 if σ(z) ≥ 0.5, else 0
Step 3: Threshold the probability to get a class prediction
zσ(z)0.5Class 0 zoneClass 1 zone01
The sigmoid squashes any number into (0, 1). Large positive → 1, large negative → 0, zero → 0.5
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

What's the output range of the sigmoid function σ(z)?

Decision Boundary

The Decision Boundary

Feature 1Feature 2Decision boundaryσ(z) = 0.5 here
2D example: the model learns a line. Everything above → Class 1, everything below → Class 0.
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

Logistic regression's decision boundary is always a:

Training

How We Train It: Log-Loss

Binary cross-entropy (log-loss)

📉 Log-Loss = When y = 1 + When y = 0
When y = 1

If the true class is 1, loss equals −log(ŷ). A confident correct prediction (ŷ ≈ 0.99) costs nearly zero, but a confident wrong prediction (ŷ ≈ 0.01) costs ~4.6 — exponentially harsh punishment.

−y · log(ŷ)
When y = 0

If the true class is 0, loss equals −log(1−ŷ). The model is penalized for assigning high probability to the wrong class — the more confident the mistake, the steeper the penalty.

−(1−y) · log(1−ŷ)
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

A logistic regression model predicts 0.99 probability for a sample that is actually class 0. What's the log-loss for this sample?

Multi-class

Beyond Binary: Multi-class Classification

Softmax for multi-class

1
z₁, z₂, ..., zₖ = raw scores for K classes
Each class gets its own weight vector → its own score
2
P(class k) = eᶻᵏ / Σⱼ eᶻʲ
Softmax: exponentiate and normalize. All probabilities sum to 1.
3
Prediction = argmax(P(class 1), ..., P(class K))
Pick the class with highest probability
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

In logistic regression with 5 classes, how many weight vectors does the model learn?

In Practice

When to Use Logistic Regression

Great For Linear boundaries, interpretability, baselines OK For Text classification with TF-IDF, high-dim sparse data Bad For Images, nonlinear boundaries, complex interactions 💡 Pro tip Always try logistic regression first — it's your baseline
Logistic regression: deceptively powerful for the right problems

🎓 What You Now Know

Logistic regression = linear model + sigmoid — Squash z ∈ (-∞,∞) into p ∈ (0,1).

Decision boundary is always linear — A hyperplane where σ(z) = 0.5.

Train with log-loss, not MSE — Cross-entropy is convex and punishes confident mistakes.

Multi-class uses softmax — K scores → K probabilities that sum to 1.

Always start with logistic regression — It’s fast, interpretable, and sets a strong baseline.

Logistic regression is the workhorse of classification. It’s used everywhere — spam detection, medical diagnosis, click-through prediction, credit scoring. And the sigmoid + cross-entropy combination is the same output layer used in neural networks. You just learned a neural net’s final layer. 🚀

Keep Learning