PCA — Compressing Reality Without Losing the Plot
A scroll-driven visual deep dive into Principal Component Analysis. Learn eigenvectors, variance maximization, dimensionality reduction, and when PCA transforms your data — and when it doesn't.
Dimensionality Reduction
100 features. 3 that matter.
PCA finds them.
Principal Component Analysis rotates your high-dimensional data to find the axes of maximum variance. Project onto the top few axes and you’ve compressed reality — keeping the signal, dropping the noise.
Find the Axis of Maximum Spread
PCA finds directions of maximum variance. Why is maximum variance a good criterion for compression?
💡 If a variable is constant (zero variance), does it tell you anything about individual data points?
If a feature has near-zero variance (all values are almost the same), it carries almost no information — dropping it loses nothing. Conversely, directions with high variance distinguish data points from each other — that's where the 'signal' lives. PCA keeps the high-variance directions and drops the low-variance ones. This is optimal under the assumption that variance = information, which works well for many (but not all) datasets.
The Eigenvalue Recipe
PCA in 4 steps
1. Center: X̃ = X − μ 2. Covariance matrix: C = (1/n) X̃ᵀX̃ 3. Eigendecompose: C = VΛVᵀ 4. Project: Z = X̃V_k You run PCA on 100-dimensional data and get eigenvalues λ₁=50, λ₂=30, λ₃=10, λ₄=5, λ₅=3, and the rest sum to 2. How much variance is explained by the top 3 components?
💡 Variance explained = sum of top k eigenvalues / sum of all eigenvalues...
Total variance = sum of ALL eigenvalues = 50+30+10+5+3+2 = 100. Top 3 eigenvalues = 50+30+10 = 90. Proportion = 90/100 = 90%. This means 3 components capture 90% of the information in 100 dimensions — a 97% reduction in dimensionality with only 10% information loss. In practice, we typically keep enough components to explain 95% of variance.
The Scree Plot
Reading Principal Components
Loadings: what each component means
High loadings on height (0.7) and arm_span (0.7) mean PC1 captures overall body frame size. When someone scores high on PC1, they tend to be tall with long arms.
PC1 = 0.7·height + 0.7·arm_span + 0.1·weight Almost all the loading is on weight (0.97), with minimal contribution from height or arm span. PC2 captures body mass independent of frame size.
PC2 = −0.1·height + 0.2·arm_span + 0.97·weight Each principal component is a unit vector in feature space — its squared loadings sum to 1. The loadings are direction cosines that tell you exactly how much each original feature contributes.
Σ loading² = 1 PCA requires the data to be centered (mean-subtracted). Should you also STANDARDIZE (divide by standard deviation) before PCA?
💡 What happens if one feature is measured in dollars (range: 0–100,000) and another in meters (range: 0–2)?
If feature A ranges from 0-1 and feature B ranges from 0-100,000, feature B has much higher variance purely due to scale. PCA would pick B as PC1 regardless of actual information content. Standardizing (z-scoring) puts all features on equal footing. Exception: if features are already on the same scale (e.g., all gene expression levels), don't standardize — the variance differences are meaningful.
When PCA Doesn’t Work
🎓 What You Now Know
✓ PCA = rotation to max-variance axes — Find eigenvectors of covariance matrix.
✓ Eigenvalues = variance explained — Keep top k components that explain 95%+ variance.
✓ Loadings reveal meaning — Components are weighted sums of original features.
✓ Standardize first (usually) — Unless features share the same scale.
✓ Linear only — For non-linear structure, use t-SNE, UMAP, or kernel PCA.
PCA is the most important unsupervised technique in your toolbox. It appears everywhere: image compression, noise reduction, feature engineering, visualization, and as preprocessing for nearly every ML pipeline. Master it, and you’ll see it everywhere. 🔬
↗ Keep Learning
K-Means Clustering — Grouping Data Without Labels
A scroll-driven visual deep dive into K-Means clustering. Learn the iterative algorithm, choosing K with the elbow method, limitations, and when to use alternatives.
Feature Engineering — The Art That Makes or Breaks Your Model
A scroll-driven visual deep dive into feature engineering. Learn transformations, encoding, interaction features, handling missing data, and why feature engineering matters more than model choice.
Comments
No comments yet. Be the first!