Ensemble Methods¶

Scenario: Predicting Loan Defaults in Banking¶

You're a data scientist at a bank predicting which loan applicants will default. A single decision tree overfits and is unstable—use ensemble methods like random forests or gradient boosting to combine hundreds of trees for robust, accurate predictions that generalize well to new applicants.

Learning Objectives¶

By the end of this module (30-40 minutes), you should be able to: - Explain the differences between bagging, boosting, and stacking. - Implement random forests and gradient boosting with scikit-learn. - Tune ensemble hyperparameters like number of estimators and learning rate. - Interpret feature importances from tree-based ensembles. - Choose the right ensemble strategy for your data and task.

Prerequisites: Basic scikit-learn (fit, predict); understanding of decision trees. Difficulty: Intermediate.

What This Is¶

Ensemble methods combine multiple models to produce a stronger prediction than any single model alone. The three core strategies are bagging, boosting, and stacking. Each addresses a different weakness.

When You Use It¶

when a single model is unstable or has high variance (use bagging)
when a single model is too weak and underfits (use boosting)
when you want to combine diverse model types for maximum performance (use stacking)
when you need a strong baseline that does not require careful feature engineering

Strategy Comparison¶

Strategy	How It Works	Reduces	Risk
Bagging	train many models on bootstrap samples, average predictions	variance	limited improvement on bias
Boosting	train models sequentially, each correcting the previous errors	bias	can overfit if not regularized
Stacking	train diverse base models, then train a meta-model on their outputs	both	more complex, risk of leakage

Tooling¶

Estimator	Type	When to use
`RandomForestClassifier`	bagging	strong default for tabular data
`GradientBoostingClassifier`	boosting	when you need sequential error correction
`HistGradientBoostingClassifier`	boosting	faster on larger datasets, handles missing values natively
`AdaBoostClassifier`	boosting	simpler boosting baseline
`BaggingClassifier`	bagging	wrapping any base estimator with bootstrap
`VotingClassifier`	voting	simple combination of diverse models
`StackingClassifier`	stacking	when you want a meta-learner on top of base models

Minimal Examples¶

Random Forest¶

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=0)
rf.fit(X_train, y_train)
print(f"Validation accuracy: {rf.score(X_valid, y_valid):.3f}")

Gradient Boosting¶

from sklearn.ensemble import HistGradientBoostingClassifier

gb = HistGradientBoostingClassifier(max_iter=200, max_depth=4, learning_rate=0.1)
gb.fit(X_train, y_train)
print(f"Validation accuracy: {gb.score(X_valid, y_valid):.3f}")

Stacking¶

from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

estimators = [
    ("dt", DecisionTreeClassifier(max_depth=5)),
    ("svc", SVC(kernel="rbf", probability=True)),
]
stack = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(),
    cv=5,
)
stack.fit(X_train, y_train)

Feature Importance¶

Ensemble methods provide built-in feature importance:

importances = rf.feature_importances_
sorted_idx = importances.argsort()[::-1]
for i in sorted_idx[:10]:
    print(f"  {feature_names[i]:>25}: {importances[i]:.4f}")

Use this as a debugging tool, not as ground truth—permutation importance is a more reliable check.

Failure Pattern¶

Using a random forest with unlimited depth on a small dataset and trusting the training score. The model memorizes every training example but generalizes poorly.

Another trap: stacking models without proper cross-validation for the meta-learner, which causes the stacking layer to see data it was trained on.

Common Mistakes¶

setting n_estimators too low and giving up on the model too early
not tuning max_depth or learning_rate for boosting models
treating feature importance as definitive without checking with permutation importance
assuming ensembles always beat simple models (they do not when the data is tiny or the features are weak)

Practice¶

Compare a single decision tree against a random forest on the same split.
Compare random forest against gradient boosting and explain which one wins and why.
Inspect feature importances and name one feature that might be spurious.
Build a stacking classifier and compare it against the best single model.
Explain why boosting can overfit if the learning rate is too high.

Case Study: XGBoost in Kaggle Competitions¶

XGBoost, a gradient boosting implementation, won numerous Kaggle competitions by providing fast, accurate ensembles. Its success showed how tuned boosting can outperform complex neural networks on tabular data, revolutionizing ML workflows.

Expanded Quick Quiz¶

What's the main difference between bagging and boosting?

Answer: Bagging trains models independently on random subsets; boosting trains sequentially, each correcting previous errors.

Why do random forests reduce overfitting compared to single trees?

Answer: By averaging predictions from many trees trained on different bootstraps, they reduce variance without increasing bias much.

How does gradient boosting work?

Answer: It fits models to the residuals of previous models, gradually reducing the error by adding weak learners.

In the loan default scenario, why use ensembles?

Answer: To create stable predictions that generalize better than a single overfitted tree, improving reliability for financial decisions.

Progress Checkpoint¶

[ ] Trained a random forest and gradient boosting model on tabular data.
[ ] Tuned hyperparameters and compared validation performance.
[ ] Analyzed feature importances and identified key predictors.
[ ] Answered quiz questions without peeking.

Milestone: Complete this to unlock "Hyperparameter Tuning" in the Classical ML track. Share your ensemble comparison in the academy Discord!

Runnable Example¶

Longer Connection¶

Continue with Hyperparameter Tuning for systematic search across ensemble hyperparameters, and Cross-Validation for the split strategy that keeps ensemble evaluation honest.