Timed Checkpoint Sheets¶

Use these sheets when the topic already makes sense and you want short retrieval pressure.

The rule is simple:

set the timer first
answer cold
mark unstable reasoning
open the answer key only after the timer ends
route back to the weak topic or pack immediately

Sheet A: Validation and Leakage Sprint¶

Time: 20 minutes

Prompts:

You computed target encoding on the full dataset before the split. What is the problem?
You tried many hyperparameter settings and kept looking at the test set. Can the final test score still be treated as clean?
A random row split beats a group-by-user split by a large margin. What is the first suspicion?
A public leaderboard improves sharply but local cross-validation does not. What is the disciplined next move?
Average F1 improved, but the weakest slice collapsed. What mistake should you avoid?
Review capacity is capped at a fixed number per day. What should you inspect before default F1 at threshold 0.5?

Answer Key

1. Leakage through preprocessing; the held-out data influenced the encoder. 2. No. The test set became part of model selection. 3. Entity leakage or near-duplicate information across splits. 4. Keep the split fixed and demand stronger local evidence before trusting the jump. 5. Do not let the average score erase a slice failure that matters operationally. 6. Precision, recall, or utility at the review-budget cutoff.

Continue with: Validation, Leakage, and Model Choice

Sheet B: Representation and Geometry Sprint¶

Time: 20 minutes

Prompts:

Why should scale differences matter before KMeans?
What does the first principal component optimize?
Can a t-SNE plot prove the true number of clusters?
In a Gaussian mixture, what happens to responsibilities when variances shrink?
A frozen pretrained embedding plus linear head beats a scratch model on small data. Is that surprising?
A new embedding looks cleaner in 2D but performs worse downstream. Which signal should control deployment?

Answer Key

1. Distance-based clustering is scale-sensitive, so large-scale features can dominate. 2. Variance explained along a direction. 3. No. t-SNE is an inspection tool, not proof of cluster truth. 4. Responsibilities sharpen toward the more compatible component. 5. No. Reuse often wins in low-data settings because it lowers variance. 6. The downstream task metric should control the decision.

Continue with: Unsupervised Learning and Representation

Sheet C: Training and Checkpoint Sprint¶

Time: 25 minutes

Prompts:

Loss oscillates wildly and turns nan. What is the first optimizer diagnosis?
Train loss falls while validation loss rises after epoch 8. Which checkpoint should you keep?
You have limited labeled data and a strong pretrained backbone. What is the safer first transfer move?
The model already fits training very well but misses validation badly. Bigger network or stronger regularization?
Why can tiny-batch BatchNorm behavior be risky during fine-tuning?
Lower final train loss but worse best validation metric: which run wins?

Answer Key

1. The learning rate is probably too high. 2. Keep the best-validation checkpoint. 3. Start with a frozen backbone or small unfreeze, not full adaptation immediately. 4. Stronger regularization first. 5. Batch statistics can become noisy and destabilize adaptation. 6. The run with the better held-out metric wins.

Continue with: Deep Learning and Checkpoints

Weekly Use¶

Use the sheets in this order:

study the matching topic or pack
return later for one timed sheet without notes
log the first shaky question
revisit the same sheet after rerunning the weak topic

The page is doing its job only if it routes you back to the exact weak concept instead of just producing another score.