Ensemble Methods¶
Scenario: Predicting Loan Defaults in Banking¶
You're a data scientist at a bank predicting which loan applicants will default. A single decision tree overfits and is unstable—use ensemble methods like random forests or gradient boosting to combine hundreds of trees for robust, accurate predictions that generalize well to new applicants.
Learning Objectives¶
By the end of this module (30-40 minutes), you should be able to: - Explain the differences between bagging, boosting, and stacking. - Implement random forests and gradient boosting with scikit-learn. - Tune ensemble hyperparameters like number of estimators and learning rate. - Interpret feature importances from tree-based ensembles. - Choose the right ensemble strategy for your data and task.
Prerequisites: Basic scikit-learn (fit, predict); understanding of decision trees. Difficulty: Intermediate.
What This Is¶
Ensemble methods combine multiple models to produce a stronger prediction than any single model alone. The three core strategies are bagging, boosting, and stacking. Each addresses a different weakness.
When You Use It¶
- when a single model is unstable or has high variance (use bagging)
- when a single model is too weak and underfits (use boosting)
- when you want to combine diverse model types for maximum performance (use stacking)
- when you need a strong baseline that does not require careful feature engineering
Strategy Comparison¶
| Strategy | How It Works | Reduces | Risk |
|---|---|---|---|
| Bagging | train many models on bootstrap samples, average predictions | variance | limited improvement on bias |
| Boosting | train models sequentially, each correcting the previous errors | bias | can overfit if not regularized |
| Stacking | train diverse base models, then train a meta-model on their outputs | both | more complex, risk of leakage |
Tooling¶
| Estimator | Type | When to use |
|---|---|---|
RandomForestClassifier |
bagging | strong default for tabular data |
GradientBoostingClassifier |
boosting | when you need sequential error correction |
HistGradientBoostingClassifier |
boosting | faster on larger datasets, handles missing values natively |
AdaBoostClassifier |
boosting | simpler boosting baseline |
BaggingClassifier |
bagging | wrapping any base estimator with bootstrap |
VotingClassifier |
voting | simple combination of diverse models |
StackingClassifier |
stacking | when you want a meta-learner on top of base models |
Minimal Examples¶
Random Forest¶
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=0)
rf.fit(X_train, y_train)
print(f"Validation accuracy: {rf.score(X_valid, y_valid):.3f}")
Gradient Boosting¶
from sklearn.ensemble import HistGradientBoostingClassifier
gb = HistGradientBoostingClassifier(max_iter=200, max_depth=4, learning_rate=0.1)
gb.fit(X_train, y_train)
print(f"Validation accuracy: {gb.score(X_valid, y_valid):.3f}")
Stacking¶
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
estimators = [
("dt", DecisionTreeClassifier(max_depth=5)),
("svc", SVC(kernel="rbf", probability=True)),
]
stack = StackingClassifier(
estimators=estimators,
final_estimator=LogisticRegression(),
cv=5,
)
stack.fit(X_train, y_train)
Feature Importance¶
Ensemble methods provide built-in feature importance:
importances = rf.feature_importances_
sorted_idx = importances.argsort()[::-1]
for i in sorted_idx[:10]:
print(f" {feature_names[i]:>25}: {importances[i]:.4f}")
Use this as a debugging tool, not as ground truth—permutation importance is a more reliable check.
Failure Pattern¶
Using a random forest with unlimited depth on a small dataset and trusting the training score. The model memorizes every training example but generalizes poorly.
Another trap: stacking models without proper cross-validation for the meta-learner, which causes the stacking layer to see data it was trained on.
Common Mistakes¶
- setting
n_estimatorstoo low and giving up on the model too early - not tuning
max_depthorlearning_ratefor boosting models - treating feature importance as definitive without checking with permutation importance
- assuming ensembles always beat simple models (they do not when the data is tiny or the features are weak)
Practice¶
- Compare a single decision tree against a random forest on the same split.
- Compare random forest against gradient boosting and explain which one wins and why.
- Inspect feature importances and name one feature that might be spurious.
- Build a stacking classifier and compare it against the best single model.
- Explain why boosting can overfit if the learning rate is too high.
Case Study: XGBoost in Kaggle Competitions¶
XGBoost, a gradient boosting implementation, won numerous Kaggle competitions by providing fast, accurate ensembles. Its success showed how tuned boosting can outperform complex neural networks on tabular data, revolutionizing ML workflows.
Expanded Quick Quiz¶
What's the main difference between bagging and boosting?
Answer: Bagging trains models independently on random subsets; boosting trains sequentially, each correcting previous errors.
Why do random forests reduce overfitting compared to single trees?
Answer: By averaging predictions from many trees trained on different bootstraps, they reduce variance without increasing bias much.
How does gradient boosting work?
Answer: It fits models to the residuals of previous models, gradually reducing the error by adding weak learners.
In the loan default scenario, why use ensembles?
Answer: To create stable predictions that generalize better than a single overfitted tree, improving reliability for financial decisions.
Progress Checkpoint¶
- [ ] Trained a random forest and gradient boosting model on tabular data.
- [ ] Tuned hyperparameters and compared validation performance.
- [ ] Analyzed feature importances and identified key predictors.
- [ ] Answered quiz questions without peeking.
Milestone: Complete this to unlock "Hyperparameter Tuning" in the Classical ML track. Share your ensemble comparison in the academy Discord!
Further Reading¶
- Scikit-Learn Ensemble Guide.
- XGBoost documentation for advanced boosting.
- "Elements of Statistical Learning" for theoretical foundations.
Runnable Example¶
Longer Connection¶
Continue with Hyperparameter Tuning for systematic search across ensemble hyperparameters, and Cross-Validation for the split strategy that keeps ensemble evaluation honest.