Timed Checkpoint Sheets¶
Use these sheets when the topic already makes sense and you want short retrieval pressure.
The rule is simple:
- set the timer first
- answer cold
- mark unstable reasoning
- open the answer key only after the timer ends
- route back to the weak topic or pack immediately
Sheet A: Validation and Leakage Sprint¶
Time: 20 minutes
Prompts:
- You computed target encoding on the full dataset before the split. What is the problem?
- You tried many hyperparameter settings and kept looking at the test set. Can the final test score still be treated as clean?
- A random row split beats a group-by-user split by a large margin. What is the first suspicion?
- A public leaderboard improves sharply but local cross-validation does not. What is the disciplined next move?
- Average F1 improved, but the weakest slice collapsed. What mistake should you avoid?
- Review capacity is capped at a fixed number per day. What should you inspect before default F1 at threshold
0.5?
Answer Key
1. Leakage through preprocessing; the held-out data influenced the encoder. 2. No. The test set became part of model selection. 3. Entity leakage or near-duplicate information across splits. 4. Keep the split fixed and demand stronger local evidence before trusting the jump. 5. Do not let the average score erase a slice failure that matters operationally. 6. Precision, recall, or utility at the review-budget cutoff.Continue with: Validation, Leakage, and Model Choice
Sheet B: Representation and Geometry Sprint¶
Time: 20 minutes
Prompts:
- Why should scale differences matter before
KMeans? - What does the first principal component optimize?
- Can a t-SNE plot prove the true number of clusters?
- In a Gaussian mixture, what happens to responsibilities when variances shrink?
- A frozen pretrained embedding plus linear head beats a scratch model on small data. Is that surprising?
- A new embedding looks cleaner in
2Dbut performs worse downstream. Which signal should control deployment?
Answer Key
1. Distance-based clustering is scale-sensitive, so large-scale features can dominate. 2. Variance explained along a direction. 3. No. t-SNE is an inspection tool, not proof of cluster truth. 4. Responsibilities sharpen toward the more compatible component. 5. No. Reuse often wins in low-data settings because it lowers variance. 6. The downstream task metric should control the decision.Continue with: Unsupervised Learning and Representation
Sheet C: Training and Checkpoint Sprint¶
Time: 25 minutes
Prompts:
- Loss oscillates wildly and turns
nan. What is the first optimizer diagnosis? - Train loss falls while validation loss rises after epoch 8. Which checkpoint should you keep?
- You have limited labeled data and a strong pretrained backbone. What is the safer first transfer move?
- The model already fits training very well but misses validation badly. Bigger network or stronger regularization?
- Why can tiny-batch BatchNorm behavior be risky during fine-tuning?
- Lower final train loss but worse best validation metric: which run wins?
Answer Key
1. The learning rate is probably too high. 2. Keep the best-validation checkpoint. 3. Start with a frozen backbone or small unfreeze, not full adaptation immediately. 4. Stronger regularization first. 5. Batch statistics can become noisy and destabilize adaptation. 6. The run with the better held-out metric wins.Continue with: Deep Learning and Checkpoints
Weekly Use¶
Use the sheets in this order:
- study the matching topic or pack
- return later for one timed sheet without notes
- log the first shaky question
- revisit the same sheet after rerunning the weak topic
The page is doing its job only if it routes you back to the exact weak concept instead of just producing another score.