Skip to content

Track 09

Mock Tasks and Timed Workflows

This track turns open-ended work into a disciplined task loop: baseline first, fixed validation rule, one deliberate improvement, one slice check, and a clean handoff at the end.

Primary Goal

Make One Strong Decision Chain

The point is not to try everything. The point is to preserve a clean split, one rule, and one accountable iteration under time pressure.

You Will Practice

Baseline, Triage, Stop Rule

Model ladders, fixed metric ranking, weak-slice inspection, submission hygiene, and short reports that explain why the workflow stopped.

Best First Move

Run Two Timed Demos

Use the baseline-first and error-triage examples first, then move into the full workflow once the decision rhythm already feels familiar.

Use This Track When

  • you can already run a baseline and read a validation table
  • the main weakness is stopping discipline rather than setup
  • you need practice defending one iteration under a time budget
  • you want a route that feels closer to competition or real task pressure

What This Track Is Training

This track trains one practical rule:

  • do not spend a timed task on moves you cannot justify

That means the learner should keep these fixed and visible:

  1. the split rule
  2. the primary metric
  3. the baseline
  4. the weakest slice
  5. the stopping condition

First Session

Use this order:

  1. Baseline-First Task Solving
  2. run academy/.venv/bin/python academy/examples/mock-task-recipes/baseline_first_demo.py
  3. run academy/.venv/bin/python academy/examples/mock-task-recipes/error_triage_demo.py
  4. read one clinic such as Public/Private Restraint
  5. write one stop-or-continue note before touching the full lab

Full Track Loop

For the complete workflow:

  1. run the two short timed-task examples first
  2. run academy/.venv/bin/python academy/labs/mock-tasks-and-timed-workflows/src/mock_task_workflow.py
  3. inspect the leaderboard and weakest-slice summary before reading the holdout summary
  4. run academy/.venv/bin/python academy/labs/mock-tasks-and-timed-workflows/src/chronological_mock_task_workflow.py only after the base workflow is readable
  5. finish the matching exercises in academy/exercises/mock-tasks-and-timed-workflows/
  6. keep one short report with the baseline, selected model, weakest slice, and stopping reason

What To Inspect

By the end of the track, the learner should have inspected:

  • baseline versus selected model under one fixed metric rule
  • one ranked validation leaderboard
  • one weakest-slice comparison with support counts
  • one clean submission artifact tied to a named run
  • one chronological comparison showing whether the winner survives a stricter split

Common Failure Modes

  • changing the split after the first leaderboard appears
  • choosing the winner with whichever metric flatters the latest run
  • adding extra models because they exist, not because the evidence is weak
  • reading the overall score before checking the weakest slice
  • writing the report after the fact without a real stopping rule

Exit Standard

Before leaving this track, the learner should be able to:

  • defend the baseline and the selected model under one fixed rule
  • explain whether the winner fixed the main weakness or only improved the average
  • say whether the chronological split changed the decision
  • point to the first artifact they would show if someone challenged the result
  • make a real stop-or-continue call instead of hiding behind "more experiments"