Track 19

Problem Adaptation and Post-Processing

This track is for the stage after the first model already exists: threshold search, cost-sensitive framing, noisy-label cleanup, guarded pseudo-labeling, slice-aware rules, and simple ensembling on one fixed split.

Open Threshold Topic Open Baseline Topic Use The Study Plan

Primary Goal

Adapt The Decision Layer, Not Just The Model

The point is not to stack more tricks on top of a weak baseline. The point is to decide whether thresholding, weighting, cleanup, pseudo-label guards, or a simple ensemble changes the workflow in a way you can still defend.

You Will Practice

Thresholds, Cleanup, Guardrails

Threshold search, class weighting, noisy-label cleanup, pseudo-label safeguards, slice-aware checks, and simple score blending after the base model already runs.

Best First Move

Run The Guardrails Demo Before The Full Lab

Use the short example to see which adaptation moves are even plausible. Then enter the lab only after the score-to-decision boundary is already clear.

Use This Track When¶

the baseline model exists, but the default 0.5 threshold or default queue policy is clearly weak
noisy labels, rare positives, or unlabeled data tempt you to add post-model rules
you need to decide whether weighting, cleanup, pseudo-labeling, or ensembling earns its place
the real deployment question is about operating point, budget, or slice behavior rather than raw model family choice

What This Track Is Training¶

turning model scores into an operating rule that matches the task
keeping every adaptation move on the same split and under the same metric story
separating defensible post-processing from metric chasing
rejecting adaptation steps that look clever but do not survive slice checks or operational constraints

First Session¶

Start from the repo root.

Run the short example first:

academy/.venv/bin/python academy/examples/problem-adaptation-recipes/adaptation_guardrails_demo.py

Then enter the full lab:

academy/.venv/bin/python -m pip install -r academy/labs/problem-adaptation-and-post-processing/requirements.txt
academy/.venv/bin/python academy/labs/problem-adaptation-and-post-processing/src/problem_adaptation_and_post_processing.py

Then open the matching exercises:

exercises directory: academy/exercises/problem-adaptation-and-post-processing/
exercises file: academy/exercises/problem-adaptation-and-post-processing/README.md

On the first pass, keep one question in view: which single adaptation move changed the final decision rule the most without breaking the split?

Full Track Loop¶

Read Calibration and Thresholds first, and return to Honest Splits and Baselines if the baseline boundary is still weak.
Run the example command from repo root and decide whether thresholding, weighting, or guardrails already look like the first lever.
Run the lab from repo root and keep the validation rule fixed while you compare thresholding, cleanup, pseudo-labeling, and ensembling.
Inspect the written artifacts before making any claim about the best move.
Work through academy/exercises/problem-adaptation-and-post-processing/README.md and defend one kept move and one rejected move.
End with one short decision note about what the adaptation layer changed beyond the base model.

What To Inspect¶

Look at the lab outputs in this order:

threshold_table.csv
slice_summary.csv
cleaning_summary.csv
pseudo_label_summary.csv
ensemble_summary.csv
adaptation_report.md

What to decide from them:

whether the best threshold is meaningfully different from the naive default
whether the slice story still holds after the chosen operating point
whether noisy-label cleanup helped more than model hopping would have
whether the pseudo-label pool stayed small and guarded enough to trust
whether the simple ensemble improved the same split honestly or just added clutter

Common Failure Modes¶

guessing the threshold from intuition instead of the score table
changing the split while also changing the adaptation rule
pseudo-labeling with no confidence or agreement guardrails
cleaning labels with a weaker model than the one producing the claim
hiding a weak slice behind an aggregate improvement
adding an ensemble before the best single-model policy is already defensible

Exit Standard¶

Before leaving this track, a learner should be able to:

defend one operating point with a practical reason, not just a metric bump
explain whether weighting, cleanup, pseudo-labeling, or ensembling did the most defensible work
name the slice that still limits the workflow
point to the matching exercises directory and finish the post-model comparison there: academy/exercises/problem-adaptation-and-post-processing/
state one adaptation step they would keep and one they would reject under the same split