Regression Metrics and Diagnostics¶
Scenario: Predicting House Prices¶
You're building a model to predict home sale prices based on features like size and location. Errors in thousands of dollars matter—use regression metrics to evaluate accuracy and diagnose where the model fails, ensuring reliable price estimates for buyers and sellers.
What This Is¶
Classification has accuracy, precision, and recall. Regression has its own set of metrics — and its own ways to lie to you. This topic covers how to measure regression quality and how to diagnose where a model fails.
When You Use It¶
- evaluating any model that predicts a continuous value
- choosing between MSE, MAE, and R² for model selection
- diagnosing whether errors are systematic or randomly scattered
- communicating model quality to stakeholders in interpretable units
Metric Comparison¶
| Metric | Formula Intuition | Sensitive To | Best For |
|---|---|---|---|
| MSE | average of squared errors | outliers (heavily) | penalizing large mistakes |
| RMSE | √MSE — same units as target | outliers | interpretable error magnitude |
| MAE | average of absolute errors | moderate outlier sensitivity | robust central error |
| R² | 1 − (MSE / variance of y) | scale-free | comparing across datasets |
| MAPE | % error relative to true value | small true values (divides by y) | business percentage targets |
| Median AE | median of absolute errors | resistant to outliers | when the median error matters more than the mean |
When Each Metric Misleads¶
- MSE/RMSE: one huge outlier can dominate the entire score
- MAE: hides the fact that some predictions are extremely wrong
- R²: can be negative (model worse than predicting the mean), and does not tell you where errors concentrate
- MAPE: explodes when true values are near zero
Minimal Example¶
from sklearn.metrics import (
mean_squared_error, mean_absolute_error, r2_score,
median_absolute_error, root_mean_squared_error,
)
import numpy as np
y_true = np.array([3.0, 5.0, 2.5, 7.0, 4.5])
y_pred = np.array([2.8, 5.3, 2.0, 8.1, 4.2])
print(f"MSE: {mean_squared_error(y_true, y_pred):.4f}")
print(f"RMSE: {root_mean_squared_error(y_true, y_pred):.4f}")
print(f"MAE: {mean_absolute_error(y_true, y_pred):.4f}")
print(f"Median AE: {median_absolute_error(y_true, y_pred):.4f}")
print(f"R²: {r2_score(y_true, y_pred):.4f}")
Residual Analysis — The Real Diagnostic¶
Metrics give you one number. Residuals tell you the story.
residuals = y_true - y_pred
# Residual plot: predictions vs errors
# Look for: patterns, fan shapes, systematic bias
What residual patterns mean¶
| Pattern | Diagnosis |
|---|---|
| Random scatter around zero | ✅ the model is unbiased |
| Fan shape (errors grow with predictions) | heteroscedasticity — consider log-transforming the target |
| Curved pattern | the model misses a nonlinear relationship |
| Cluster of large errors in one region | the model fails on a specific subgroup |
| All residuals positive or negative | systematic bias — the model consistently over/under-predicts |
The Diagnostic Ladder¶
- Compute metrics — MSE, MAE, R² for the overall picture
- Plot residuals vs predictions — look for patterns
- Plot residuals vs features — find where the model fails
- Check residuals by group — are errors worse for a subpopulation?
- Compare against a baseline — does the model beat
DummyRegressor(strategy="mean")?
Baseline Pattern¶
from sklearn.dummy import DummyRegressor
dummy = DummyRegressor(strategy="mean")
dummy.fit(X_train, y_train)
dummy_pred = dummy.predict(X_valid)
print(f"Dummy MAE: {mean_absolute_error(y_valid, dummy_pred):.3f}")
print(f"Model MAE: {mean_absolute_error(y_valid, model_pred):.3f}")
If the model barely beats the dummy, the features are probably too weak — not the model.
When To Log-Transform The Target¶
If the target spans orders of magnitude (e.g., house prices from $50K to $5M), predicting in log space often helps:
import numpy as np
y_log = np.log1p(y_train)
model.fit(X_train, y_log)
pred_log = model.predict(X_valid)
pred_original = np.expm1(pred_log)
Check whether residuals become more uniform after the transform.
Failure Pattern¶
Reporting only R² without checking residuals. An R² of 0.85 sounds good, but if all the errors concentrate on high-value predictions, the model is systematically failing where it matters most.
Another failure: using MAPE on data with zeros or near-zero values, which produces infinite or misleading percentages.
Common Mistakes¶
- comparing MSE across datasets with different target scales (use R² or normalize)
- forgetting that R² can be negative — it just means the model is worse than the mean
- treating a low MAE as proof of a good model when the residuals show systematic patterns
- optimizing for MSE when the business cares about MAE (or vice versa)
Practice¶
- Compute MSE, MAE, and R² for a regression model and explain what each tells you.
- Plot residuals versus predictions and describe the pattern you see.
- Add one outlier to the dataset and show how MSE changes compared to MAE.
- Compare a model against
DummyRegressorand explain whether the model adds value. - Apply a log transform to the target, retrain, and check whether residuals improve.
- Explain when you would prefer MAE over MSE for model selection.
Case Study: Regression in Financial Forecasting¶
Financial models use RMSE to evaluate stock price predictions, where large errors are penalized heavily. This helps prioritize models that avoid catastrophic mispredictions in volatile markets.
Expanded Quick Quiz¶
Why use RMSE instead of MSE?
Answer: RMSE is in the same units as the target, making errors more interpretable (e.g., dollars instead of squared dollars).
What does a negative R² mean?
Answer: The model performs worse than simply predicting the mean of the target.
How does MAE differ from MSE?
Answer: MAE is less sensitive to outliers, providing a robust measure of central error tendency.
In the house price scenario, why plot residuals?
Answer: To check for systematic errors (e.g., underpredicting expensive homes), guiding model improvements.
Progress Checkpoint¶
- [ ] Computed multiple regression metrics (MSE, MAE, R², RMSE).
- [ ] Plotted residuals vs. predictions to diagnose patterns.
- [ ] Analyzed outlier impact on metrics.
- [ ] Compared model against dummy regressor.
- [ ] Interpreted results for model selection.
- [ ] Answered quiz questions without peeking.
Milestone: Complete this to unlock "SVM Margins and Kernels" in the Classical ML track. Share your residual analysis in the academy Discord!
Further Reading¶
- Scikit-Learn Regression Metrics Guide.
- "Interpreting Regression Metrics" tutorials.
- Residual analysis best practices.
Runnable Example¶
Longer Connection¶
Continue with Evaluation Metrics Deep Dive for the classification-side counterpart, and Learning Curves and Bias-Variance for diagnosing whether the model needs more data or more capacity.