Track 09

Mock Tasks and Timed Workflows

This track turns open-ended work into a disciplined task loop: baseline first, fixed validation rule, one deliberate improvement, one slice check, and a clean handoff at the end.

Open Timed Task Topic Run Fast Examples Use The Study Plan

Primary Goal

Make One Strong Decision Chain

The point is not to try everything. The point is to preserve a clean split, one rule, and one accountable iteration under time pressure.

You Will Practice

Baseline, Triage, Stop Rule

Model ladders, fixed metric ranking, weak-slice inspection, submission hygiene, and short reports that explain why the workflow stopped.

Best First Move

Run Two Timed Demos

Use the baseline-first and error-triage examples first, then move into the full workflow once the decision rhythm already feels familiar.

Use This Track When¶

you can already run a baseline and read a validation table
the main weakness is stopping discipline rather than setup
you need practice defending one iteration under a time budget
you want a route that feels closer to competition or real task pressure

What This Track Is Training¶

This track trains one practical rule:

do not spend a timed task on moves you cannot justify

That means the learner should keep these fixed and visible:

the split rule
the primary metric
the baseline
the weakest slice
the stopping condition

First Session¶

Use this order:

Baseline-First Task Solving
run academy/.venv/bin/python academy/examples/mock-task-recipes/baseline_first_demo.py
run academy/.venv/bin/python academy/examples/mock-task-recipes/error_triage_demo.py
read one clinic such as Public/Private Restraint
write one stop-or-continue note before touching the full lab

Full Track Loop¶

For the complete workflow:

run the two short timed-task examples first
run academy/.venv/bin/python academy/labs/mock-tasks-and-timed-workflows/src/mock_task_workflow.py
inspect the leaderboard and weakest-slice summary before reading the holdout summary
run academy/.venv/bin/python academy/labs/mock-tasks-and-timed-workflows/src/chronological_mock_task_workflow.py only after the base workflow is readable
finish the matching exercises in academy/exercises/mock-tasks-and-timed-workflows/
keep one short report with the baseline, selected model, weakest slice, and stopping reason

What To Inspect¶

By the end of the track, the learner should have inspected:

baseline versus selected model under one fixed metric rule
one ranked validation leaderboard
one weakest-slice comparison with support counts
one clean submission artifact tied to a named run
one chronological comparison showing whether the winner survives a stricter split

Common Failure Modes¶

changing the split after the first leaderboard appears
choosing the winner with whichever metric flatters the latest run
adding extra models because they exist, not because the evidence is weak
reading the overall score before checking the weakest slice
writing the report after the fact without a real stopping rule

Exit Standard¶

Before leaving this track, the learner should be able to:

defend the baseline and the selected model under one fixed rule
explain whether the winner fixed the main weakness or only improved the average
say whether the chronological split changed the decision
point to the first artifact they would show if someone challenged the result
make a real stop-or-continue call instead of hiding behind "more experiments"