Introduction 0%

Introduction

🎯 0/4 0%

Unsupervised Learning

No labels? No problem.
Let the data group itself.

K-Means finds structure in unlabeled data by partitioning points into K clusters, each defined by its center. It’s the simplest, fastest, and most widely used clustering algorithm — and understanding its strengths and limits is essential.

The Algorithm

Two Steps, Repeated

K-Means alternates between assignment and update steps until convergence

↑ Answer the question above to continue ↑

K-Means is guaranteed to converge, but is it guaranteed to find the GLOBAL optimum?

The Math

What K-Means Optimizes

Within-Cluster Sum of Squares (WCSS / Inertia)

J = Σₖ Σ_{x∈Cₖ} ||x − μₖ||²

Minimize the total squared distance from each point to its assigned centroid

Assignment step: cᵢ = argminₖ ||xᵢ − μₖ||²

Assign each point to the nearest centroid

Update step: μₖ = (1/|Cₖ|) Σ_{x∈Cₖ} x

Move each centroid to the mean of its assigned points

Each step decreases J → guaranteed convergence

Both steps reduce or maintain J. Algorithm stops when J doesn't change.

↑ Answer the question above to continue ↑

K-Means uses Euclidean distance (L2). This means it finds clusters that are:

Choosing K

How Many Clusters?

The Elbow Method: plot WCSS vs K, look for the 'elbow' where returns diminish

↑ Answer the question above to continue ↑

The Silhouette Score for a point is defined as (b − a) / max(a, b), where a = avg distance to same-cluster points, b = avg distance to nearest other cluster. A silhouette score of 0 means:

Limitations

When K-Means Fails

K-Means assumes equal-size, spherical clusters — it fails on other shapes

In Practice

K-Means in the Real World

↑ Answer the question above to continue ↑

K-Means time complexity is O(n·K·d·i) where n=points, K=clusters, d=dimensions, i=iterations. For 1M points, 10 clusters, 100 dimensions, and 50 iterations, approximately how many distance computations?

🎓 What You Now Know

✓ K-Means alternates assign + update — Assign points to nearest centroid, move centroids to cluster mean.

✓ Minimizes WCSS (inertia) — Converges to a local optimum. Use K-Means++ for smart initialization.

✓ Choose K with elbow or silhouette — Or let domain knowledge decide.

✓ Assumes spherical clusters — Fails on non-convex shapes, varying sizes/densities.

✓ Fast and scalable — O(nKdi), linear in data size. Mini-batch variant for very large datasets.

K-Means is the “Hello World” of unsupervised learning — simple, fast, and surprisingly effective. But knowing its limitations is what separates a beginner from a practitioner. When K-Means fails, DBSCAN and PCA are your next moves. 🎯

K-Means Clustering — Grouping Data Without Labels

No labels? No problem.
Let the data group itself.

Two Steps, Repeated

What K-Means Optimizes

Within-Cluster Sum of Squares (WCSS / Inertia)

How Many Clusters?

When K-Means Fails

K-Means in the Real World

🎓 What You Now Know

Comments

↗ Keep Learning

DBSCAN — Finding Clusters of Any Shape

PCA — Compressing Reality Without Losing the Plot

DBSCAN — Finding Clusters of Any Shape

No labels? No problem. Let the data group itself.

Two Steps, Repeated

What K-Means Optimizes

Within-Cluster Sum of Squares (WCSS / Inertia)

How Many Clusters?

When K-Means Fails

K-Means in the Real World

🎓 What You Now Know

Comments

↗ Keep Learning

DBSCAN — Finding Clusters of Any Shape

PCA — Compressing Reality Without Losing the Plot

DBSCAN — Finding Clusters of Any Shape

No labels? No problem.
Let the data group itself.