Introduction 0%

Introduction

🎯 0/3 0%

Dimensionality Reduction

100 features. 3 that matter.
PCA finds them.

Principal Component Analysis rotates your high-dimensional data to find the axes of maximum variance. Project onto the top few axes and you’ve compressed reality — keeping the signal, dropping the noise.

Intuition

Find the Axis of Maximum Spread

PCA finds the direction of maximum variance (PC1), then the perpendicular direction of second-most variance (PC2)

↑ Answer the question above to continue ↑

PCA finds directions of maximum variance. Why is maximum variance a good criterion for compression?

The Math

The Eigenvalue Recipe

PCA in 4 steps

1. Center: X̃ = X − μ

Subtract the mean of each feature so the data is centered at the origin

2. Covariance matrix: C = (1/n) X̃ᵀX̃

C is a p×p matrix where C_ij = covariance between feature i and feature j

3. Eigendecompose: C = VΛVᵀ

V = eigenvectors (principal components), Λ = eigenvalues (variance explained)

4. Project: Z = X̃V_k

V_k = top k eigenvectors → Z is your k-dimensional representation

↑ Answer the question above to continue ↑

You run PCA on 100-dimensional data and get eigenvalues λ₁=50, λ₂=30, λ₃=10, λ₄=5, λ₅=3, and the rest sum to 2. How much variance is explained by the top 3 components?

How Many Components?

The Scree Plot

Plot eigenvalues in descending order. Look for the 'elbow' — where variance explained drops off sharply

Interpretation

Reading Principal Components

Loadings: what each component means

📐 PC1: Body Frame

High loadings on height (0.7) and arm_span (0.7) mean PC1 captures overall body frame size. When someone scores high on PC1, they tend to be tall with long arms.

PC1 = 0.7·height + 0.7·arm_span + 0.1·weight

⚖️ PC2: Body Mass

Almost all the loading is on weight (0.97), with minimal contribution from height or arm span. PC2 captures body mass independent of frame size.

PC2 = −0.1·height + 0.2·arm_span + 0.97·weight

📏 Unit Vectors

Each principal component is a unit vector in feature space — its squared loadings sum to 1. The loadings are direction cosines that tell you exactly how much each original feature contributes.

Σ loading² = 1

↑ Answer the question above to continue ↑

PCA requires the data to be centered (mean-subtracted). Should you also STANDARDIZE (divide by standard deviation) before PCA?

Limitations

When PCA Doesn’t Work

PCA finds LINEAR directions. Non-linear structure requires different methods.

🎓 What You Now Know

✓ PCA = rotation to max-variance axes — Find eigenvectors of covariance matrix.

✓ Eigenvalues = variance explained — Keep top k components that explain 95%+ variance.

✓ Loadings reveal meaning — Components are weighted sums of original features.

✓ Standardize first (usually) — Unless features share the same scale.

✓ Linear only — For non-linear structure, use t-SNE, UMAP, or kernel PCA.

PCA is the most important unsupervised technique in your toolbox. It appears everywhere: image compression, noise reduction, feature engineering, visualization, and as preprocessing for nearly every ML pipeline. Master it, and you’ll see it everywhere. 🔬

📄 A Tutorial on Principal Component Analysis (Shlens, 2014)

PCA — Compressing Reality Without Losing the Plot

100 features. 3 that matter.
PCA finds them.

Find the Axis of Maximum Spread

The Eigenvalue Recipe

PCA in 4 steps

The Scree Plot

Reading Principal Components

Loadings: what each component means

When PCA Doesn’t Work

🎓 What You Now Know

Comments

↗ Keep Learning

K-Means Clustering — Grouping Data Without Labels

Feature Engineering — The Art That Makes or Breaks Your Model

K-Means Clustering — Grouping Data Without Labels

100 features. 3 that matter. PCA finds them.

Find the Axis of Maximum Spread

The Eigenvalue Recipe

PCA in 4 steps

The Scree Plot

Reading Principal Components

Loadings: what each component means

When PCA Doesn’t Work

🎓 What You Now Know

Comments

↗ Keep Learning

K-Means Clustering — Grouping Data Without Labels

Feature Engineering — The Art That Makes or Breaks Your Model

K-Means Clustering — Grouping Data Without Labels

100 features. 3 that matter.
PCA finds them.