Clinic 02

Leakage Or Signal?

A validation jump looks too good to ignore. The harder question is whether the gain came from a real feature or from an answer key in disguise.

Back To Clinics Open Leakage Topic Open The Full Track

Situation

Huge Gain, Wrong Reason?

The most predictive feature is also the one whose availability looks suspiciously late in the workflow.

Your Job

Ship, Reject, Or Inspect

Choose the model you would keep, name the feature you would reject or keep, and say what evidence would change your mind.

Bad Habit To Avoid

Score Jump = Progress

If the whole argument is “it scored better,” the clinic failed.

Situation¶

You are reviewing a support-triage model with repeated customers and a frozen grouped split.

The packet says:

the baseline pipeline is stable
one new feature causes a dramatic jump
the same feature is populated only after the case is resolved
a safer feature gives a smaller but believable gain

Artifact Packet¶

Read the packet before you decide:

candidate	grouped ROC AUC	grouped average precision	prediction-time availability	top feature
`baseline_pipeline`	0.712	0.391	yes	`recent_backlog_count`
`plus_resolution_status`	0.964	0.901	no, filled after the case is closed	`resolved_within_24h`
`plus_backlog_pressure`	0.748	0.427	yes	`recent_backlog_count` + `queue_pressure`

The tempting move is obvious: plus_resolution_status dominates the board.

The harder question is the real one: would that score survive the actual prediction timeline?

Decision Prompt¶

Write the note before you open the reveal.

Your note should answer:

Which candidate would you keep right now?
Which feature or change would you reject immediately?
Would you stop or continue?
What evidence would justify changing the decision?

Keep the note short. Four to six sentences is enough.

Strong Reasoning Looks Like¶

it names the strongest score and still refuses to ship a feature that is unavailable at prediction time
it prefers a smaller honest gain over a dramatic invalid gain
it uses the grouped split and the feature-availability rule together
it says what next inspection would still be worth doing

Common Wrong Moves¶

shipping the best row without mentioning feature availability
saying “the grouped split is clean, so the feature must be safe”
rejecting every feature change instead of separating the unsafe one from the useful one
continuing without naming what you would inspect next

Reference Reveal¶

Open only after you write the note

The reference choice is: - `selected_candidate = plus_backlog_pressure` - `rejected_feature = resolved_within_24h` - `decision = continue only with prediction-time-safe features` Why: - the late feature is not available when the prediction is made - the huge gain is therefore invalid evidence, not a real improvement - the smaller grouped gain from `queue_pressure` is still honest enough to inspect further The practical lesson is simple: a believable smaller gain is stronger than an impossible larger one.

What To Do Next¶

After this clinic:

open Leakage Patterns
run the matching leakage example
use scikit-learn Validation and Tuning when you want the full split-pipeline-selection workflow