All articles
· 9 min deep-divemachine-learningunsupervisedlinear-algebra
Article 1 in your session

PCA — Compressing Reality Without Losing the Plot

A scroll-driven visual deep dive into Principal Component Analysis. Learn eigenvectors, variance maximization, dimensionality reduction, and when PCA transforms your data — and when it doesn't.

Introduction 0%
Introduction
🎯 0/3 0%

Dimensionality Reduction

100 features. 3 that matter.
PCA finds them.

Principal Component Analysis rotates your high-dimensional data to find the axes of maximum variance. Project onto the top few axes and you’ve compressed reality — keeping the signal, dropping the noise.

Intuition

Find the Axis of Maximum Spread

Original 2D data (correlated features)PC1max variancePC2⊥ to PC1
PCA finds the direction of maximum variance (PC1), then the perpendicular direction of second-most variance (PC2)
↑ Answer the question above to continue ↑
🟢 Quick Check Knowledge Check

PCA finds directions of maximum variance. Why is maximum variance a good criterion for compression?

The Math

The Eigenvalue Recipe

PCA in 4 steps

1
1. Center: X̃ = X − μ
Subtract the mean of each feature so the data is centered at the origin
2
2. Covariance matrix: C = (1/n) X̃ᵀX̃
C is a p×p matrix where C_ij = covariance between feature i and feature j
3
3. Eigendecompose: C = VΛVᵀ
V = eigenvectors (principal components), Λ = eigenvalues (variance explained)
4
4. Project: Z = X̃V_k
V_k = top k eigenvectors → Z is your k-dimensional representation
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

You run PCA on 100-dimensional data and get eigenvalues λ₁=50, λ₂=30, λ₃=10, λ₄=5, λ₅=3, and the rest sum to 2. How much variance is explained by the top 3 components?

How Many Components?

The Scree Plot

Component #EigenvalueKeep 3After elbow: noise, drop these
Plot eigenvalues in descending order. Look for the 'elbow' — where variance explained drops off sharply
Interpretation

Reading Principal Components

Loadings: what each component means

📐 PC1: Body Frame

High loadings on height (0.7) and arm_span (0.7) mean PC1 captures overall body frame size. When someone scores high on PC1, they tend to be tall with long arms.

PC1 = 0.7·height + 0.7·arm_span + 0.1·weight
⚖️ PC2: Body Mass

Almost all the loading is on weight (0.97), with minimal contribution from height or arm span. PC2 captures body mass independent of frame size.

PC2 = −0.1·height + 0.2·arm_span + 0.97·weight
📏 Unit Vectors

Each principal component is a unit vector in feature space — its squared loadings sum to 1. The loadings are direction cosines that tell you exactly how much each original feature contributes.

Σ loading² = 1
↑ Answer the question above to continue ↑
🟡 Checkpoint Knowledge Check

PCA requires the data to be centered (mean-subtracted). Should you also STANDARDIZE (divide by standard deviation) before PCA?

Limitations

When PCA Doesn’t Work

✓ PCA Works Well• Linear correlations between features• Gaussian-ish distributions• Preprocessing / visualization• Noise reduction / compression✗ PCA Fails• Non-linear manifolds (use t-SNE/UMAP)• Categorical / discrete features• When variance ≠ importance• Supervised tasks (use LDA instead)
PCA finds LINEAR directions. Non-linear structure requires different methods.

🎓 What You Now Know

PCA = rotation to max-variance axes — Find eigenvectors of covariance matrix.

Eigenvalues = variance explained — Keep top k components that explain 95%+ variance.

Loadings reveal meaning — Components are weighted sums of original features.

Standardize first (usually) — Unless features share the same scale.

Linear only — For non-linear structure, use t-SNE, UMAP, or kernel PCA.

PCA is the most important unsupervised technique in your toolbox. It appears everywhere: image compression, noise reduction, feature engineering, visualization, and as preprocessing for nearly every ML pipeline. Master it, and you’ll see it everywhere. 🔬

📄 A Tutorial on Principal Component Analysis (Shlens, 2014)

Keep Learning