Skip to content

Clinic 03

Review Budget Freeze

The model is good enough to automate some cases, but reviewer capacity is fixed. You have to choose the policy you can actually defend this week.

Situation

One Queue, Three Policies

All three policies score well enough to sound plausible, but only one fits the review budget without hiding the hardest cases.

Your Job

Freeze One Policy

Choose the policy you would run this week, decide whether to keep tuning, and say what evidence would change your decision.

Bad Habit To Avoid

Best Metric, No Budget

If the chosen policy breaks the queue, it is not the right policy no matter how flattering the score looks.

Situation

You are reviewing a rare-event triage workflow with a hard rule:

  • reviewers can handle at most 15 percent of the weekly queue
  • the critical slice is new_vendor
  • the policy can automate obvious negatives and obvious positives, but the middle cases should go to review

Artifact Packet

Read this packet before you decide:

policy review load auto-positive precision auto-negative miss rate new_vendor review load calibrated?
wide_band 0.34 0.44 0.07 0.49 yes
budget_fit_band 0.15 0.58 0.12 0.18 yes
strict_positive_only 0.09 0.71 0.29 0.07 yes

The tempting move is obvious: strict_positive_only has the best precision.

The harder question is the real one: which policy fits the budget while still catching enough of the hard cases to be defensible?

Decision Prompt

Write the note before you open the reveal.

Your note should answer:

  1. Which policy would you freeze right now?
  2. Would you stop or continue?
  3. Which single metric or slice drove the decision most?
  4. What evidence would justify changing the decision?

Keep the note short. Four to six sentences is enough.

Strong Reasoning Looks Like

  • it checks the review limit before celebrating precision
  • it uses the miss rate and the new_vendor slice together
  • it prefers a budget-fitting policy over a flattering but operationally invalid one
  • it names one clear next policy change instead of asking for vague more tuning

Common Wrong Moves

  • choosing the highest precision policy without mentioning the miss rate
  • shipping the widest review band even though it breaks capacity
  • optimizing for one global number while the critical slice gets worse
  • saying “continue” without naming what threshold or band you would change

Reference Reveal

Open only after you write the note The reference choice is: - `selected_policy = budget_fit_band` - `decision = stop until the next review window` Why: - it is the only policy that fits the hard 15 percent review budget - its miss rate is still materially lower than the strict policy - the `new_vendor` slice remains visible instead of being pushed out of the queue The practical lesson is simple: the best policy is the one that survives both the metric and the queue constraint.

What To Do Next

After this clinic:

  1. open Imbalanced Metrics and Review Budgets
  2. open Selective Prediction and Review Budgets
  3. use Imbalanced Triage and Review Budgets when you want the full budget-aware workflow