Clinic 03
Review Budget Freeze
The model is good enough to automate some cases, but reviewer capacity is fixed. You have to choose the policy you can actually defend this week.
Situation
One Queue, Three Policies
All three policies score well enough to sound plausible, but only one fits the review budget without hiding the hardest cases.
Your Job
Freeze One Policy
Choose the policy you would run this week, decide whether to keep tuning, and say what evidence would change your decision.
Bad Habit To Avoid
Best Metric, No Budget
If the chosen policy breaks the queue, it is not the right policy no matter how flattering the score looks.
Situation¶
You are reviewing a rare-event triage workflow with a hard rule:
- reviewers can handle at most 15 percent of the weekly queue
- the critical slice is
new_vendor - the policy can automate obvious negatives and obvious positives, but the middle cases should go to review
Artifact Packet¶
Read this packet before you decide:
| policy | review load | auto-positive precision | auto-negative miss rate | new_vendor review load |
calibrated? |
|---|---|---|---|---|---|
wide_band |
0.34 | 0.44 | 0.07 | 0.49 | yes |
budget_fit_band |
0.15 | 0.58 | 0.12 | 0.18 | yes |
strict_positive_only |
0.09 | 0.71 | 0.29 | 0.07 | yes |
The tempting move is obvious: strict_positive_only has the best precision.
The harder question is the real one: which policy fits the budget while still catching enough of the hard cases to be defensible?
Decision Prompt¶
Write the note before you open the reveal.
Your note should answer:
- Which policy would you freeze right now?
- Would you stop or continue?
- Which single metric or slice drove the decision most?
- What evidence would justify changing the decision?
Keep the note short. Four to six sentences is enough.
Strong Reasoning Looks Like¶
- it checks the review limit before celebrating precision
- it uses the miss rate and the
new_vendorslice together - it prefers a budget-fitting policy over a flattering but operationally invalid one
- it names one clear next policy change instead of asking for vague more tuning
Common Wrong Moves¶
- choosing the highest precision policy without mentioning the miss rate
- shipping the widest review band even though it breaks capacity
- optimizing for one global number while the critical slice gets worse
- saying “continue” without naming what threshold or band you would change
Reference Reveal¶
Open only after you write the note
The reference choice is: - `selected_policy = budget_fit_band` - `decision = stop until the next review window` Why: - it is the only policy that fits the hard 15 percent review budget - its miss rate is still materially lower than the strict policy - the `new_vendor` slice remains visible instead of being pushed out of the queue The practical lesson is simple: the best policy is the one that survives both the metric and the queue constraint.What To Do Next¶
After this clinic:
- open Imbalanced Metrics and Review Budgets
- open Selective Prediction and Review Budgets
- use Imbalanced Triage and Review Budgets when you want the full budget-aware workflow