Introduction 0%

Introduction

🎯 0/4 0%

Ensemble Methods

One tree is unstable.
A forest is unshakeable.

Train 500 decision trees on random subsets of data and features. Average their predictions. The errors cancel out, the signal adds up. That’s the Random Forest — the most reliable algorithm in ML.

Bagging

Step 1: Bootstrap Aggregating (Bagging)

Bagging: sample with replacement, train independent trees, aggregate

Why averaging reduces variance

Var(single tree) = σ²

One tree is noisy — high variance

Var(average of B independent trees) = σ²/B

If trees are independent, variance drops linearly with B!

But trees aren't fully independent...

They're trained on the same data. Correlation reduces the benefit.

Solution: make trees MORE different

That's where feature randomness comes in → Random Forest

↑ Answer the question above to continue ↑

Each bootstrap sample draws N points from a dataset of N points WITH replacement. What fraction of original points are typically left out?

Feature Randomness

Step 2: Random Feature Selection

How Random Forest differs from Bagged Trees

🌲 Bagged Trees vs 🌳 Random Forest vs 📐 Variance Formula

🌲 Bagged Trees

Each split considers ALL p features. Since every tree splits on the same dominant features first, the trees end up highly correlated — reducing the benefit of averaging.

Each split considers all p features

🌳 Random Forest

Each split only considers √p random features. This forces trees to use different features and find different patterns, making them decorrelated and far more diverse.

Each split considers √p random features

📐 Variance Formula

The ensemble variance depends on the average correlation ρ between trees. Lower correlation means lower variance — this is why Random Forest beats plain bagging.

Var(RF) = ρσ² + (1-ρ)σ²/B

↑ Answer the question above to continue ↑

You have 100 features. At each split in a Random Forest, how many features does a tree consider?

OOB Error

Free Validation: Out-of-Bag Error

Each point is predicted by trees that never saw it — free cross-validation

↑ Answer the question above to continue ↑

Why is OOB error nearly as good as cross-validation error?

Feature Importance

Which Features Matter Most?

Two ways to measure feature importance

📊 Impurity-Based vs 🔀 Permutation

📊 Impurity-Based

Sum up the Gini or entropy reduction from every split that uses feature f, across all trees. Fast and built-in, but biased toward high-cardinality features (like IDs) that get artificially inflated importance.

Importance(f) = Σ(gain from splits on feature f)

🔀 Permutation

Shuffle a feature's values randomly and measure how much accuracy drops. If accuracy drops a lot, the feature was important. This measures actual prediction impact, not just split frequency — more reliable than impurity-based.

Importance(f) = accuracy_original − accuracy_shuffled

In Practice

Random Forest: The Reliable Default

Why Random Forest is often the first algorithm to try

↑ Answer the question above to continue ↑

Adding more trees to a Random Forest (e.g., going from 100 to 10,000 trees):

🎓 What You Now Know

✓ Bagging averages noisy models to reduce variance — Train B trees on bootstrap samples, average predictions.

✓ Feature randomness decorrelates trees — Only √p random features per split forces diversity.

✓ OOB error = free validation — ~37% of points are left out of each tree, giving unbiased error estimates.

✓ More trees never hurts — Can’t overfit by adding trees. Diminishing returns after ~500.

✓ Best default for tabular data — No scaling, no tuning, just works. Start here.

Random Forest is the AK-47 of machine learning: reliable, robust, hard to misuse. Every ML practitioner should have it in their toolkit. And understanding it prepares you for Gradient Boosting — the technique that takes ensemble methods to the next level. 🚀

📄 Random Forests (Breiman, 2001)

Random Forests — Why 1000 Bad Models Beat 1 Good One

One tree is unstable.
A forest is unshakeable.

Step 1: Bootstrap Aggregating (Bagging)

Why averaging reduces variance

Step 2: Random Feature Selection

How Random Forest differs from Bagged Trees

Free Validation: Out-of-Bag Error

Which Features Matter Most?

Two ways to measure feature importance

Random Forest: The Reliable Default

🎓 What You Now Know

Comments

↗ Keep Learning

Decision Trees — How Machines Learn to Ask Questions

Gradient Boosting & XGBoost — The Kaggle King

Bagging vs Boosting — The Two Philosophies of Ensemble Learning

Decision Trees — How Machines Learn to Ask Questions

One tree is unstable. A forest is unshakeable.

Step 1: Bootstrap Aggregating (Bagging)

Why averaging reduces variance

Step 2: Random Feature Selection

How Random Forest differs from Bagged Trees

Free Validation: Out-of-Bag Error

Which Features Matter Most?

Two ways to measure feature importance

Random Forest: The Reliable Default

🎓 What You Now Know

Comments

↗ Keep Learning

Decision Trees — How Machines Learn to Ask Questions

Gradient Boosting & XGBoost — The Kaggle King

Bagging vs Boosting — The Two Philosophies of Ensemble Learning

Decision Trees — How Machines Learn to Ask Questions

One tree is unstable.
A forest is unshakeable.