Track 09
Mock Tasks and Timed Workflows
This track turns open-ended work into a disciplined task loop: baseline first, fixed validation rule, one deliberate improvement, one slice check, and a clean handoff at the end.
Primary Goal
Make One Strong Decision Chain
The point is not to try everything. The point is to preserve a clean split, one rule, and one accountable iteration under time pressure.
You Will Practice
Baseline, Triage, Stop Rule
Model ladders, fixed metric ranking, weak-slice inspection, submission hygiene, and short reports that explain why the workflow stopped.
Best First Move
Run Two Timed Demos
Use the baseline-first and error-triage examples first, then move into the full workflow once the decision rhythm already feels familiar.
Use This Track When¶
- you can already run a baseline and read a validation table
- the main weakness is stopping discipline rather than setup
- you need practice defending one iteration under a time budget
- you want a route that feels closer to competition or real task pressure
What This Track Is Training¶
This track trains one practical rule:
- do not spend a timed task on moves you cannot justify
That means the learner should keep these fixed and visible:
- the split rule
- the primary metric
- the baseline
- the weakest slice
- the stopping condition
First Session¶
Use this order:
- Baseline-First Task Solving
- run
academy/.venv/bin/python academy/examples/mock-task-recipes/baseline_first_demo.py - run
academy/.venv/bin/python academy/examples/mock-task-recipes/error_triage_demo.py - read one clinic such as Public/Private Restraint
- write one stop-or-continue note before touching the full lab
Full Track Loop¶
For the complete workflow:
- run the two short timed-task examples first
- run
academy/.venv/bin/python academy/labs/mock-tasks-and-timed-workflows/src/mock_task_workflow.py - inspect the leaderboard and weakest-slice summary before reading the holdout summary
- run
academy/.venv/bin/python academy/labs/mock-tasks-and-timed-workflows/src/chronological_mock_task_workflow.pyonly after the base workflow is readable - finish the matching exercises in
academy/exercises/mock-tasks-and-timed-workflows/ - keep one short report with the baseline, selected model, weakest slice, and stopping reason
What To Inspect¶
By the end of the track, the learner should have inspected:
- baseline versus selected model under one fixed metric rule
- one ranked validation leaderboard
- one weakest-slice comparison with support counts
- one clean submission artifact tied to a named run
- one chronological comparison showing whether the winner survives a stricter split
Common Failure Modes¶
- changing the split after the first leaderboard appears
- choosing the winner with whichever metric flatters the latest run
- adding extra models because they exist, not because the evidence is weak
- reading the overall score before checking the weakest slice
- writing the report after the fact without a real stopping rule
Exit Standard¶
Before leaving this track, the learner should be able to:
- defend the baseline and the selected model under one fixed rule
- explain whether the winner fixed the main weakness or only improved the average
- say whether the chronological split changed the decision
- point to the first artifact they would show if someone challenged the result
- make a real stop-or-continue call instead of hiding behind "more experiments"