Skip to content

Ensemble Methods

Scenario: Predicting Loan Defaults in Banking

You're a data scientist at a bank predicting which loan applicants will default. A single decision tree overfits and is unstable—use ensemble methods like random forests or gradient boosting to combine hundreds of trees for robust, accurate predictions that generalize well to new applicants.

Learning Objectives

By the end of this module (30-40 minutes), you should be able to: - Explain the differences between bagging, boosting, and stacking. - Implement random forests and gradient boosting with scikit-learn. - Tune ensemble hyperparameters like number of estimators and learning rate. - Interpret feature importances from tree-based ensembles. - Choose the right ensemble strategy for your data and task.

Prerequisites: Basic scikit-learn (fit, predict); understanding of decision trees. Difficulty: Intermediate.

What This Is

Ensemble methods combine multiple models to produce a stronger prediction than any single model alone. The three core strategies are bagging, boosting, and stacking. Each addresses a different weakness.

When You Use It

  • when a single model is unstable or has high variance (use bagging)
  • when a single model is too weak and underfits (use boosting)
  • when you want to combine diverse model types for maximum performance (use stacking)
  • when you need a strong baseline that does not require careful feature engineering

Strategy Comparison

Strategy How It Works Reduces Risk
Bagging train many models on bootstrap samples, average predictions variance limited improvement on bias
Boosting train models sequentially, each correcting the previous errors bias can overfit if not regularized
Stacking train diverse base models, then train a meta-model on their outputs both more complex, risk of leakage

Tooling

Estimator Type When to use
RandomForestClassifier bagging strong default for tabular data
GradientBoostingClassifier boosting when you need sequential error correction
HistGradientBoostingClassifier boosting faster on larger datasets, handles missing values natively
AdaBoostClassifier boosting simpler boosting baseline
BaggingClassifier bagging wrapping any base estimator with bootstrap
VotingClassifier voting simple combination of diverse models
StackingClassifier stacking when you want a meta-learner on top of base models

Minimal Examples

Random Forest

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=200, max_depth=8, random_state=0)
rf.fit(X_train, y_train)
print(f"Validation accuracy: {rf.score(X_valid, y_valid):.3f}")

Gradient Boosting

from sklearn.ensemble import HistGradientBoostingClassifier

gb = HistGradientBoostingClassifier(max_iter=200, max_depth=4, learning_rate=0.1)
gb.fit(X_train, y_train)
print(f"Validation accuracy: {gb.score(X_valid, y_valid):.3f}")

Stacking

from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

estimators = [
    ("dt", DecisionTreeClassifier(max_depth=5)),
    ("svc", SVC(kernel="rbf", probability=True)),
]
stack = StackingClassifier(
    estimators=estimators,
    final_estimator=LogisticRegression(),
    cv=5,
)
stack.fit(X_train, y_train)

Feature Importance

Ensemble methods provide built-in feature importance:

importances = rf.feature_importances_
sorted_idx = importances.argsort()[::-1]
for i in sorted_idx[:10]:
    print(f"  {feature_names[i]:>25}: {importances[i]:.4f}")

Use this as a debugging tool, not as ground truth—permutation importance is a more reliable check.

Failure Pattern

Using a random forest with unlimited depth on a small dataset and trusting the training score. The model memorizes every training example but generalizes poorly.

Another trap: stacking models without proper cross-validation for the meta-learner, which causes the stacking layer to see data it was trained on.

Common Mistakes

  • setting n_estimators too low and giving up on the model too early
  • not tuning max_depth or learning_rate for boosting models
  • treating feature importance as definitive without checking with permutation importance
  • assuming ensembles always beat simple models (they do not when the data is tiny or the features are weak)

Practice

  1. Compare a single decision tree against a random forest on the same split.
  2. Compare random forest against gradient boosting and explain which one wins and why.
  3. Inspect feature importances and name one feature that might be spurious.
  4. Build a stacking classifier and compare it against the best single model.
  5. Explain why boosting can overfit if the learning rate is too high.

Case Study: XGBoost in Kaggle Competitions

XGBoost, a gradient boosting implementation, won numerous Kaggle competitions by providing fast, accurate ensembles. Its success showed how tuned boosting can outperform complex neural networks on tabular data, revolutionizing ML workflows.

Expanded Quick Quiz

What's the main difference between bagging and boosting?

Answer: Bagging trains models independently on random subsets; boosting trains sequentially, each correcting previous errors.

Why do random forests reduce overfitting compared to single trees?

Answer: By averaging predictions from many trees trained on different bootstraps, they reduce variance without increasing bias much.

How does gradient boosting work?

Answer: It fits models to the residuals of previous models, gradually reducing the error by adding weak learners.

In the loan default scenario, why use ensembles?

Answer: To create stable predictions that generalize better than a single overfitted tree, improving reliability for financial decisions.

Progress Checkpoint

  • [ ] Trained a random forest and gradient boosting model on tabular data.
  • [ ] Tuned hyperparameters and compared validation performance.
  • [ ] Analyzed feature importances and identified key predictors.
  • [ ] Answered quiz questions without peeking.

Milestone: Complete this to unlock "Hyperparameter Tuning" in the Classical ML track. Share your ensemble comparison in the academy Discord!

Further Reading

  • Scikit-Learn Ensemble Guide.
  • XGBoost documentation for advanced boosting.
  • "Elements of Statistical Learning" for theoretical foundations.

Runnable Example

Longer Connection

Continue with Hyperparameter Tuning for systematic search across ensemble hyperparameters, and Cross-Validation for the split strategy that keeps ensemble evaluation honest.