Track 02
scikit-learn Validation and Tuning
This track turns evaluation discipline into code: fix the split, choose the metric, compare against a baseline, tune inside the boundary, and reject flattering improvements that do not survive honest validation.
Primary Goal
Trust The Experiment
The point is not to make a score rise. The point is to know whether the score is worth believing.
Best For
Baseline To Selection Discipline
Use this track when the next bottleneck is no longer data handling but model comparison, cross-validation, and leakage control.
Exit Rule
One Honest Validation Story
You are done when you can defend the split, metric, baseline, and tuning path in one short note.
Use This Track When¶
- the first tabular workflow is already stable
- you are ready to compare models honestly
- you need cross-validation, tuning, calibration, and leakage checks to feel mechanical
What This Track Is Training¶
This track trains one practical rule:
- tune only inside a boundary you would still trust after the result looks good
That means the learner should be able to keep these explicit:
- the prediction unit
- the split rule
- the primary metric
- the baseline
- the leakage risk
First Session¶
Use this order:
- Honest Splits and Baselines
- Leakage Patterns
- run
academy/.venv/bin/python academy/examples/validation-baseline-comparison/baseline_comparison.py - run
academy/.venv/bin/python academy/examples/classical-ml-recipes/cross_validation_demo.py - write one note on what would invalidate the experiment
Full Track Loop¶
For the complete workflow:
- review the validation topics in order
- run the baseline, cross-validation, tuning, and calibration examples
- run
academy/.venv/bin/python academy/labs/sklearn-validation-and-tuning/src/validation_tuning_workflow.py - finish the matching exercises in
academy/exercises/sklearn-validation-and-tuning/ - keep one short experiment note with the split, metric, baseline, and selected model
What To Inspect¶
By the end of the track, the learner should have inspected:
- baseline versus learned model
- fold mean and spread
- one tuning table
- one calibration or threshold view
- one leakage suspicion that was tested directly
Common Failure Modes¶
- peeking at the test set during selection
- changing the split and the model at the same time
- comparing metrics that do not match the task cost
- claiming a tuning gain without showing baseline and fold spread
- hiding preprocessing inside the wrong boundary
Exit Standard¶
Before leaving this track, the learner should be able to:
- defend the split rule
- explain why the chosen metric matches the task
- compare a baseline against the selected model honestly
- name one leakage pattern that was avoided
- say what result would still count as untrustworthy
That is enough to move into SVM and Advanced Clustering or the first deep-learning track.