Skip to content

Learning Curves and Bias-Variance

What This Is

Learning curves answer one practical question:

  • should the next move be more data, a simpler model, a stronger model, or better features

The curves matter because they turn vague bias-variance language into a visible decision.

When You Use It

  • after a first baseline
  • when train and validation behavior disagree
  • when you need to decide whether collecting more data is worth it
  • when a model feels too simple or too flexible and you want evidence

Read The Curves As Decisions

Pattern Likely story Next move
both train and validation low underfitting use a stronger model or better features
train high, validation low, large gap overfitting simplify, regularize, or add data
both high and close healthy fit do not change much without a new reason
validation still climbing with more data data may still help collect or simulate more data if practical

Minimal Pattern

from sklearn.model_selection import learning_curve
import numpy as np

train_sizes, train_scores, valid_scores = learning_curve(
    estimator=model,
    X=X_train,
    y=y_train,
    train_sizes=np.linspace(0.1, 1.0, 8),
    cv=5,
    scoring="accuracy",
)

The point is not the plot alone. The point is the next action the plot justifies.

What To Inspect First

Inspect:

  • the train curve
  • the validation curve
  • the gap between them
  • whether the validation curve is flattening

If you read only the validation curve, you lose the reason behind the shape.

Failure Pattern

The common failure is collecting more data when the model is clearly underfitting.

The opposite failure is adding model complexity when the train-validation gap is already wide.

Learning curves exist to stop both mistakes.

Common Mistakes

  • reading only one curve
  • ignoring the train-validation gap
  • assuming more data always helps
  • plotting a noisy single split and treating it as stable evidence
  • changing model capacity and data size at the same time

A Good Curve Note

After one curve read, the learner should be able to say:

  • whether the main problem is bias or variance
  • whether more data is likely to help
  • whether the next move should target model capacity or feature quality

Practice

  1. Plot a learning curve and decide whether the model is underfitting or overfitting.
  2. Explain one case where more data is useful and one where it is not.
  3. Compare a simpler and a more complex model on the same learning-curve view.
  4. State what action the curve justifies next.

Runnable Example

Longer Connection

Continue with Cross-Validation for the evaluation boundary behind these curves, and Hyperparameter Tuning when the curves say the model family is promising but the setting is not.