Skip to content

Regression Metrics and Diagnostics

Scenario: Predicting House Prices

You're building a model to predict home sale prices based on features like size and location. Errors in thousands of dollars matter—use regression metrics to evaluate accuracy and diagnose where the model fails, ensuring reliable price estimates for buyers and sellers.

What This Is

Classification has accuracy, precision, and recall. Regression has its own set of metrics — and its own ways to lie to you. This topic covers how to measure regression quality and how to diagnose where a model fails.

When You Use It

  • evaluating any model that predicts a continuous value
  • choosing between MSE, MAE, and R² for model selection
  • diagnosing whether errors are systematic or randomly scattered
  • communicating model quality to stakeholders in interpretable units

Metric Comparison

Metric Formula Intuition Sensitive To Best For
MSE average of squared errors outliers (heavily) penalizing large mistakes
RMSE √MSE — same units as target outliers interpretable error magnitude
MAE average of absolute errors moderate outlier sensitivity robust central error
1 − (MSE / variance of y) scale-free comparing across datasets
MAPE % error relative to true value small true values (divides by y) business percentage targets
Median AE median of absolute errors resistant to outliers when the median error matters more than the mean

When Each Metric Misleads

  • MSE/RMSE: one huge outlier can dominate the entire score
  • MAE: hides the fact that some predictions are extremely wrong
  • : can be negative (model worse than predicting the mean), and does not tell you where errors concentrate
  • MAPE: explodes when true values are near zero

Minimal Example

from sklearn.metrics import (
    mean_squared_error, mean_absolute_error, r2_score,
    median_absolute_error, root_mean_squared_error,
)
import numpy as np

y_true = np.array([3.0, 5.0, 2.5, 7.0, 4.5])
y_pred = np.array([2.8, 5.3, 2.0, 8.1, 4.2])

print(f"MSE:       {mean_squared_error(y_true, y_pred):.4f}")
print(f"RMSE:      {root_mean_squared_error(y_true, y_pred):.4f}")
print(f"MAE:       {mean_absolute_error(y_true, y_pred):.4f}")
print(f"Median AE: {median_absolute_error(y_true, y_pred):.4f}")
print(f"R²:        {r2_score(y_true, y_pred):.4f}")

Residual Analysis — The Real Diagnostic

Metrics give you one number. Residuals tell you the story.

residuals = y_true - y_pred

# Residual plot: predictions vs errors
# Look for: patterns, fan shapes, systematic bias

What residual patterns mean

Pattern Diagnosis
Random scatter around zero ✅ the model is unbiased
Fan shape (errors grow with predictions) heteroscedasticity — consider log-transforming the target
Curved pattern the model misses a nonlinear relationship
Cluster of large errors in one region the model fails on a specific subgroup
All residuals positive or negative systematic bias — the model consistently over/under-predicts

The Diagnostic Ladder

  1. Compute metrics — MSE, MAE, R² for the overall picture
  2. Plot residuals vs predictions — look for patterns
  3. Plot residuals vs features — find where the model fails
  4. Check residuals by group — are errors worse for a subpopulation?
  5. Compare against a baseline — does the model beat DummyRegressor(strategy="mean")?

Baseline Pattern

from sklearn.dummy import DummyRegressor

dummy = DummyRegressor(strategy="mean")
dummy.fit(X_train, y_train)
dummy_pred = dummy.predict(X_valid)

print(f"Dummy MAE:  {mean_absolute_error(y_valid, dummy_pred):.3f}")
print(f"Model MAE:  {mean_absolute_error(y_valid, model_pred):.3f}")

If the model barely beats the dummy, the features are probably too weak — not the model.

When To Log-Transform The Target

If the target spans orders of magnitude (e.g., house prices from $50K to $5M), predicting in log space often helps:

import numpy as np

y_log = np.log1p(y_train)
model.fit(X_train, y_log)
pred_log = model.predict(X_valid)
pred_original = np.expm1(pred_log)

Check whether residuals become more uniform after the transform.

Failure Pattern

Reporting only R² without checking residuals. An R² of 0.85 sounds good, but if all the errors concentrate on high-value predictions, the model is systematically failing where it matters most.

Another failure: using MAPE on data with zeros or near-zero values, which produces infinite or misleading percentages.

Common Mistakes

  • comparing MSE across datasets with different target scales (use R² or normalize)
  • forgetting that R² can be negative — it just means the model is worse than the mean
  • treating a low MAE as proof of a good model when the residuals show systematic patterns
  • optimizing for MSE when the business cares about MAE (or vice versa)

Practice

  1. Compute MSE, MAE, and R² for a regression model and explain what each tells you.
  2. Plot residuals versus predictions and describe the pattern you see.
  3. Add one outlier to the dataset and show how MSE changes compared to MAE.
  4. Compare a model against DummyRegressor and explain whether the model adds value.
  5. Apply a log transform to the target, retrain, and check whether residuals improve.
  6. Explain when you would prefer MAE over MSE for model selection.

Case Study: Regression in Financial Forecasting

Financial models use RMSE to evaluate stock price predictions, where large errors are penalized heavily. This helps prioritize models that avoid catastrophic mispredictions in volatile markets.

Expanded Quick Quiz

Why use RMSE instead of MSE?

Answer: RMSE is in the same units as the target, making errors more interpretable (e.g., dollars instead of squared dollars).

What does a negative R² mean?

Answer: The model performs worse than simply predicting the mean of the target.

How does MAE differ from MSE?

Answer: MAE is less sensitive to outliers, providing a robust measure of central error tendency.

In the house price scenario, why plot residuals?

Answer: To check for systematic errors (e.g., underpredicting expensive homes), guiding model improvements.

Progress Checkpoint

  • [ ] Computed multiple regression metrics (MSE, MAE, R², RMSE).
  • [ ] Plotted residuals vs. predictions to diagnose patterns.
  • [ ] Analyzed outlier impact on metrics.
  • [ ] Compared model against dummy regressor.
  • [ ] Interpreted results for model selection.
  • [ ] Answered quiz questions without peeking.

Milestone: Complete this to unlock "SVM Margins and Kernels" in the Classical ML track. Share your residual analysis in the academy Discord!

Further Reading

  • Scikit-Learn Regression Metrics Guide.
  • "Interpreting Regression Metrics" tutorials.
  • Residual analysis best practices.

Runnable Example

Longer Connection

Continue with Evaluation Metrics Deep Dive for the classification-side counterpart, and Learning Curves and Bias-Variance for diagnosing whether the model needs more data or more capacity.