All articles
· 8 min deep-divemachine-learningunsupervised
Article 1 in your session

DBSCAN — Finding Clusters of Any Shape

A scroll-driven visual deep dive into DBSCAN. Learn density-based clustering, core/border/noise points, choosing epsilon and minPts, and when DBSCAN beats K-Means.

Introduction 0%
Introduction
🎯 0/3 0%

Density-Based Clustering

Clusters aren’t circles.
DBSCAN knows that.

K-Means forces spherical clusters and needs you to pick K. DBSCAN finds clusters of ANY shape, automatically determines how many clusters exist, and naturally identifies outliers. All it needs: a notion of “how close is close enough.”

Point Types

Three Types of Points

Core Point≥ minPts neighborswithin ε radiusBorder Point< minPts neighborsbut near a core pointNoise Point< minPts neighborsnot near any core point
Core, border, and noise points (ε = radius, minPts = 4)
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

With ε = 1.0 and minPts = 5, a point has 3 neighbors within distance 1.0, one of which is a core point. This point is:

The Algorithm

DBSCAN: Step by Step

🎯 Pick unvisited random point p 🔍 Core point? ≥ minPts in ε 🌱 Start cluster Expand via core neighbors Mark noise (may become border later) All visited? → done
DBSCAN expands clusters from core points through density-reachability
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

DBSCAN's time complexity is O(n²) in the worst case. What data structure brings it down to O(n log n)?

Choosing ε & minPts

The Two Parameters

Practical guidelines

📏 minPts Rule

Set minPts to at least the number of dimensions plus one. For noisy data, doubling that (2×d) gives smoother clusters with more points labeled as noise.

minPts ≥ dimensions + 1
📈 k-Distance Plot

For each point, compute the distance to its k-th nearest neighbor (k = minPts), sort all distances, and plot them. This reveals the density structure of your data.

🦵 Find the Knee

The k-distance plot shows a sharp increase at the boundary between cluster density and noise. Set ε at this 'knee' — it's the natural threshold separating dense regions from sparse ones.

⚠️ Sensitivity Check

If ε is too small, nearly everything becomes noise. If ε is too large, distinct clusters merge into one blob. The k-distance plot helps find the sweet spot between these extremes.

vs K-Means

DBSCAN vs K-Means

K-Means• Must specify K• Spherical clusters only• Every point belongs to a cluster• Sensitive to outliers• Fast: O(nKdi)• Equal-sized clustersBest: well-separated blobs, known KDBSCAN• Finds K automatically• Any shape clusters• Identifies noise/outliers• Robust to outliers• Slower: O(n log n) to O(n²)• Varying density = trickyBest: arbitrary shapes, outlier detection
DBSCAN finds arbitrary shapes and outliers; K-Means needs spherical clusters and a known K
↑ Answer the question above to continue ↑
🔴 Challenge Knowledge Check

DBSCAN struggles with clusters of VARYING density. Why?

Limitations

Practical Tips

🎓 What You Now Know

Three point types — Core (dense), border (near core), noise (isolated).

Clusters grow by density-reachability — Chain reaction through core points.

Use k-distance plot for ε — Find the knee in sorted k-neighbor distances.

Finds any shape, any number of clusters — Unlike K-Means’ spherical assumption.

Struggles with varying density — Use HDBSCAN for multi-density data.

DBSCAN is the algorithm you reach for when K-Means fails. Its density-based approach is beautiful in its simplicity: dense regions are clusters, sparse regions are noise. Understanding when each clustering algorithm works — and when it doesn’t — is what separates a data scientist from someone who just calls .fit(). 🔬

📄 DBSCAN: A Density-Based Algorithm for Discovering Clusters (Ester et al., 1996)

Keep Learning