Skip to content

Track 10

Imbalanced Triage and Review Budgets

This track turns rare-event scoring into a queue policy: rank honestly, choose one review budget, inspect the weak slice, and defend the operating point instead of hiding behind accuracy.

Primary Goal

Choose The Queue, Not Just The Model

The point is to decide which cases get reviewed under a fixed budget, not to celebrate a headline metric that ignores queue pressure.

Best For

Rare Positives With Fixed Review Capacity

Use this track when positives are scarce, ranking quality matters more than accuracy, and the operating point has to match a real manual-review budget.

Exit Rule

One Defensible Queue Policy

You are done when you can name the model, the budget, the operating rule, and the weak slice in one short note without changing the setup after the fact.

Use This Track When

  • the positive class is rare and plain accuracy is already misleading
  • the next decision is about which cases fit inside a fixed review queue
  • you need ranked scoring, budget curves, and one slice check to feel mechanical

What This Track Is Training

This track trains one practical rule:

  • choose one queue policy under one fixed budget and defend it with ranked evidence, not with generic accuracy claims

That means the learner should be able to keep these explicit:

  • the primary ranking metric
  • the review budget
  • the operating rule: top-k or threshold
  • the weakest slice
  • the handoff artifact that explains the policy

First Session

Use this order:

  1. Imbalanced Metrics and Review Budgets
  2. from repo root run academy/.venv/bin/python academy/examples/mock-task-recipes/imbalance_metric_demo.py
  3. from repo root run academy/.venv/bin/python academy/examples/decision-recipes/imbalanced_review_budget_demo.py
  4. write one short note on why top-k and threshold solve different operational problems

Full Track Loop

For the complete workflow:

  1. review the imbalance topic and keep one fixed metric rule before comparing models
  2. from repo root run academy/.venv/bin/python academy/examples/mock-task-recipes/review_budget_demo.py
  3. from repo root run academy/.venv/bin/python academy/examples/mock-task-recipes/error_triage_demo.py
  4. from repo root run academy/.venv/bin/python academy/labs/imbalanced-triage-and-review-budgets/src/triage_budget_workflow.py
  5. finish the matching exercises in academy/exercises/imbalanced-triage-and-review-budgets/
  6. keep one short queue note with the selected model, budget, operating rule, weak slice, and next test

What To Inspect

By the end of the track, the learner should have inspected:

  • validation_leaderboard.csv for the model ranking under one fixed rule
  • review_budget_summary.csv and holdout_budget_summary.csv for queue precision and captured recall at the same budgets
  • slice_summary.csv for the under-served band that still limits the queue
  • baseline_submission.csv, budget_submission.csv, and triage_report.md so the handoff is tied to a real policy
  • the difference between a top-k queue and a thresholded queue before claiming one is better

Common Failure Modes

  • trusting accuracy on a rare-event task
  • choosing a threshold without checking how many cases it actually sends to review
  • changing the model rule and the budget rule at the same time
  • picking the best-looking queue on validation and hiding the weak slice
  • saving a submission without naming the budget and operating rule behind it

Exit Standard

Before leaving this track, the learner should be able to:

  • explain why ranked evidence beats plain accuracy for the task
  • choose one model under one fixed metric rule
  • choose one operating point under one fixed review budget
  • name the weakest slice instead of averaging it away
  • hand off a clean, budget-aware submission and short report

That is enough to move into Structured Post-Model Algorithms or a later adaptation track where the decision layer gets even more explicit.