All articles
· 12 min deep-divemachine-learningunsupervised
Article 1 in your session

K-Means Clustering — Grouping Data Without Labels

A scroll-driven visual deep dive into K-Means clustering. Learn the iterative algorithm, choosing K with the elbow method, limitations, and when to use alternatives.

Introduction 0%
Introduction
🎯 0/4 0%

Unsupervised Learning

No labels? No problem.
Let the data group itself.

K-Means finds structure in unlabeled data by partitioning points into K clusters, each defined by its center. It’s the simplest, fastest, and most widely used clustering algorithm — and understanding its strengths and limits is essential.

The Algorithm

Two Steps, Repeated

🎯 Initialize K random centroids 📌 Assign Each point → nearest centroid 📐 Update Move centroid → cluster mean Converged? No change → stop
K-Means alternates between assignment and update steps until convergence
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

K-Means is guaranteed to converge, but is it guaranteed to find the GLOBAL optimum?

The Math

What K-Means Optimizes

Within-Cluster Sum of Squares (WCSS / Inertia)

1
J = Σₖ Σ_{x∈Cₖ} ||x − μₖ||²
Minimize the total squared distance from each point to its assigned centroid
2
Assignment step: cᵢ = argminₖ ||xᵢ − μₖ||²
Assign each point to the nearest centroid
3
Update step: μₖ = (1/|Cₖ|) Σ_{x∈Cₖ} x
Move each centroid to the mean of its assigned points
4
Each step decreases J → guaranteed convergence
Both steps reduce or maintain J. Algorithm stops when J doesn't change.
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

K-Means uses Euclidean distance (L2). This means it finds clusters that are:

Choosing K

How Many Clusters?

K (number of clusters)WCSS123456← Elbow: K=3
The Elbow Method: plot WCSS vs K, look for the 'elbow' where returns diminish
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

The Silhouette Score for a point is defined as (b − a) / max(a, b), where a = avg distance to same-cluster points, b = avg distance to nearest other cluster. A silhouette score of 0 means:

Limitations

When K-Means Fails

Different SizesSplits big clusterNon-SphericalWrong boundariesDiff DensitiesAbsorbs sparse cluster
K-Means assumes equal-size, spherical clusters — it fails on other shapes
In Practice

K-Means in the Real World

↑ Answer the question above to continue ↑
🔴 Challenge Knowledge Check

K-Means time complexity is O(n·K·d·i) where n=points, K=clusters, d=dimensions, i=iterations. For 1M points, 10 clusters, 100 dimensions, and 50 iterations, approximately how many distance computations?

🎓 What You Now Know

K-Means alternates assign + update — Assign points to nearest centroid, move centroids to cluster mean.

Minimizes WCSS (inertia) — Converges to a local optimum. Use K-Means++ for smart initialization.

Choose K with elbow or silhouette — Or let domain knowledge decide.

Assumes spherical clusters — Fails on non-convex shapes, varying sizes/densities.

Fast and scalable — O(nKdi), linear in data size. Mini-batch variant for very large datasets.

K-Means is the “Hello World” of unsupervised learning — simple, fast, and surprisingly effective. But knowing its limitations is what separates a beginner from a practitioner. When K-Means fails, DBSCAN and PCA are your next moves. 🎯

Keep Learning