Learning Curves and Bias-Variance¶
What This Is¶
Learning curves answer one practical question:
- should the next move be more data, a simpler model, a stronger model, or better features
The curves matter because they turn vague bias-variance language into a visible decision.
When You Use It¶
- after a first baseline
- when train and validation behavior disagree
- when you need to decide whether collecting more data is worth it
- when a model feels too simple or too flexible and you want evidence
Read The Curves As Decisions¶
| Pattern | Likely story | Next move |
|---|---|---|
| both train and validation low | underfitting | use a stronger model or better features |
| train high, validation low, large gap | overfitting | simplify, regularize, or add data |
| both high and close | healthy fit | do not change much without a new reason |
| validation still climbing with more data | data may still help | collect or simulate more data if practical |
Minimal Pattern¶
from sklearn.model_selection import learning_curve
import numpy as np
train_sizes, train_scores, valid_scores = learning_curve(
estimator=model,
X=X_train,
y=y_train,
train_sizes=np.linspace(0.1, 1.0, 8),
cv=5,
scoring="accuracy",
)
The point is not the plot alone. The point is the next action the plot justifies.
What To Inspect First¶
Inspect:
- the train curve
- the validation curve
- the gap between them
- whether the validation curve is flattening
If you read only the validation curve, you lose the reason behind the shape.
Failure Pattern¶
The common failure is collecting more data when the model is clearly underfitting.
The opposite failure is adding model complexity when the train-validation gap is already wide.
Learning curves exist to stop both mistakes.
Common Mistakes¶
- reading only one curve
- ignoring the train-validation gap
- assuming more data always helps
- plotting a noisy single split and treating it as stable evidence
- changing model capacity and data size at the same time
A Good Curve Note¶
After one curve read, the learner should be able to say:
- whether the main problem is bias or variance
- whether more data is likely to help
- whether the next move should target model capacity or feature quality
Practice¶
- Plot a learning curve and decide whether the model is underfitting or overfitting.
- Explain one case where more data is useful and one where it is not.
- Compare a simpler and a more complex model on the same learning-curve view.
- State what action the curve justifies next.
Runnable Example¶
Longer Connection¶
Continue with Cross-Validation for the evaluation boundary behind these curves, and Hyperparameter Tuning when the curves say the model family is promising but the setting is not.