Skip to content

Track 19

Problem Adaptation and Post-Processing

This track is for the stage after the first model already exists: threshold search, cost-sensitive framing, noisy-label cleanup, guarded pseudo-labeling, slice-aware rules, and simple ensembling on one fixed split.

Primary Goal

Adapt The Decision Layer, Not Just The Model

The point is not to stack more tricks on top of a weak baseline. The point is to decide whether thresholding, weighting, cleanup, pseudo-label guards, or a simple ensemble changes the workflow in a way you can still defend.

You Will Practice

Thresholds, Cleanup, Guardrails

Threshold search, class weighting, noisy-label cleanup, pseudo-label safeguards, slice-aware checks, and simple score blending after the base model already runs.

Best First Move

Run The Guardrails Demo Before The Full Lab

Use the short example to see which adaptation moves are even plausible. Then enter the lab only after the score-to-decision boundary is already clear.

Use This Track When

  • the baseline model exists, but the default 0.5 threshold or default queue policy is clearly weak
  • noisy labels, rare positives, or unlabeled data tempt you to add post-model rules
  • you need to decide whether weighting, cleanup, pseudo-labeling, or ensembling earns its place
  • the real deployment question is about operating point, budget, or slice behavior rather than raw model family choice

What This Track Is Training

  • turning model scores into an operating rule that matches the task
  • keeping every adaptation move on the same split and under the same metric story
  • separating defensible post-processing from metric chasing
  • rejecting adaptation steps that look clever but do not survive slice checks or operational constraints

First Session

Start from the repo root.

Run the short example first:

academy/.venv/bin/python academy/examples/problem-adaptation-recipes/adaptation_guardrails_demo.py

Then enter the full lab:

academy/.venv/bin/python -m pip install -r academy/labs/problem-adaptation-and-post-processing/requirements.txt
academy/.venv/bin/python academy/labs/problem-adaptation-and-post-processing/src/problem_adaptation_and_post_processing.py

Then open the matching exercises:

  • exercises directory: academy/exercises/problem-adaptation-and-post-processing/
  • exercises file: academy/exercises/problem-adaptation-and-post-processing/README.md

On the first pass, keep one question in view: which single adaptation move changed the final decision rule the most without breaking the split?

Full Track Loop

  1. Read Calibration and Thresholds first, and return to Honest Splits and Baselines if the baseline boundary is still weak.
  2. Run the example command from repo root and decide whether thresholding, weighting, or guardrails already look like the first lever.
  3. Run the lab from repo root and keep the validation rule fixed while you compare thresholding, cleanup, pseudo-labeling, and ensembling.
  4. Inspect the written artifacts before making any claim about the best move.
  5. Work through academy/exercises/problem-adaptation-and-post-processing/README.md and defend one kept move and one rejected move.
  6. End with one short decision note about what the adaptation layer changed beyond the base model.

What To Inspect

Look at the lab outputs in this order:

  • threshold_table.csv
  • slice_summary.csv
  • cleaning_summary.csv
  • pseudo_label_summary.csv
  • ensemble_summary.csv
  • adaptation_report.md

What to decide from them:

  • whether the best threshold is meaningfully different from the naive default
  • whether the slice story still holds after the chosen operating point
  • whether noisy-label cleanup helped more than model hopping would have
  • whether the pseudo-label pool stayed small and guarded enough to trust
  • whether the simple ensemble improved the same split honestly or just added clutter

Common Failure Modes

  • guessing the threshold from intuition instead of the score table
  • changing the split while also changing the adaptation rule
  • pseudo-labeling with no confidence or agreement guardrails
  • cleaning labels with a weaker model than the one producing the claim
  • hiding a weak slice behind an aggregate improvement
  • adding an ensemble before the best single-model policy is already defensible

Exit Standard

Before leaving this track, a learner should be able to:

  • defend one operating point with a practical reason, not just a metric bump
  • explain whether weighting, cleanup, pseudo-labeling, or ensembling did the most defensible work
  • name the slice that still limits the workflow
  • point to the matching exercises directory and finish the post-model comparison there: academy/exercises/problem-adaptation-and-post-processing/
  • state one adaptation step they would keep and one they would reject under the same split