Track 10

Imbalanced Triage and Review Budgets

This track turns rare-event scoring into a queue policy: rank honestly, choose one review budget, inspect the weak slice, and defend the operating point instead of hiding behind accuracy.

Open Triage Topic Run Fast Examples Use The Study Plan

Primary Goal

Choose The Queue, Not Just The Model

The point is to decide which cases get reviewed under a fixed budget, not to celebrate a headline metric that ignores queue pressure.

Best For

Rare Positives With Fixed Review Capacity

Use this track when positives are scarce, ranking quality matters more than accuracy, and the operating point has to match a real manual-review budget.

Exit Rule

One Defensible Queue Policy

You are done when you can name the model, the budget, the operating rule, and the weak slice in one short note without changing the setup after the fact.

Use This Track When¶

the positive class is rare and plain accuracy is already misleading
the next decision is about which cases fit inside a fixed review queue
you need ranked scoring, budget curves, and one slice check to feel mechanical

What This Track Is Training¶

This track trains one practical rule:

choose one queue policy under one fixed budget and defend it with ranked evidence, not with generic accuracy claims

That means the learner should be able to keep these explicit:

the primary ranking metric
the review budget
the operating rule: top-k or threshold
the weakest slice
the handoff artifact that explains the policy

First Session¶

Use this order:

Imbalanced Metrics and Review Budgets
from repo root run academy/.venv/bin/python academy/examples/mock-task-recipes/imbalance_metric_demo.py
from repo root run academy/.venv/bin/python academy/examples/decision-recipes/imbalanced_review_budget_demo.py
write one short note on why top-k and threshold solve different operational problems

Full Track Loop¶

For the complete workflow:

review the imbalance topic and keep one fixed metric rule before comparing models
from repo root run academy/.venv/bin/python academy/examples/mock-task-recipes/review_budget_demo.py
from repo root run academy/.venv/bin/python academy/examples/mock-task-recipes/error_triage_demo.py
from repo root run academy/.venv/bin/python academy/labs/imbalanced-triage-and-review-budgets/src/triage_budget_workflow.py
finish the matching exercises in academy/exercises/imbalanced-triage-and-review-budgets/
keep one short queue note with the selected model, budget, operating rule, weak slice, and next test

What To Inspect¶

By the end of the track, the learner should have inspected:

validation_leaderboard.csv for the model ranking under one fixed rule
review_budget_summary.csv and holdout_budget_summary.csv for queue precision and captured recall at the same budgets
slice_summary.csv for the under-served band that still limits the queue
baseline_submission.csv, budget_submission.csv, and triage_report.md so the handoff is tied to a real policy
the difference between a top-k queue and a thresholded queue before claiming one is better

Common Failure Modes¶

trusting accuracy on a rare-event task
choosing a threshold without checking how many cases it actually sends to review
changing the model rule and the budget rule at the same time
picking the best-looking queue on validation and hiding the weak slice
saving a submission without naming the budget and operating rule behind it

Exit Standard¶

Before leaving this track, the learner should be able to:

explain why ranked evidence beats plain accuracy for the task
choose one model under one fixed metric rule
choose one operating point under one fixed review budget
name the weakest slice instead of averaging it away
hand off a clean, budget-aware submission and short report

That is enough to move into Structured Post-Model Algorithms or a later adaptation track where the decision layer gets even more explicit.