Track 04

PyTorch Training Recipes

This track turns deep-learning mechanics into a stable workflow: clean loops, honest validation, readable curves, checkpoint discipline, and recipe changes that happen before architecture changes.

Open Training Loop Topic Open Recipe Topic Run Fast Examples

Primary Goal

Stabilize Training First

Use this track when the model is not the real bottleneck yet and the workflow still needs to become trustworthy.

You Will Practice

Loops, Curves, Checkpoints

train() and eval(), no_grad(), optimizer and regularization choices, and validation-based checkpoint selection.

Exit Rule

One Defensible Recipe Note

You are done when you can say which recipe you trust, why the curves support it, and what the next smallest change should be.

Use This Track When¶

classical evaluation discipline already feels stable
you are ready to debug deep-learning workflows without hiding behind architecture changes
you want checkpointing, regularization, and curve reading to feel mechanical

What This Track Is Training¶

This track trains one practical rule:

change the recipe cleanly before you change the model creatively

That means the learner should be able to keep these explicit:

where training stops and validation begins
which checkpoint was selected and why
whether the curves show underfitting, overfitting, or unstable optimization
whether a regularization or optimizer change improved the validation story

First Session¶

Use this order:

PyTorch Training Loops
Optimizers and Regularization
run academy/.venv/bin/python academy/examples/deep-learning-recipes/pytorch_training_loop_demo.py
run academy/.venv/bin/python academy/examples/deep-learning-recipes/optimizer_regularization_demo.py
write one sentence on whether the main risk is loop correctness, optimization, or regularization

Full Track Loop¶

For the complete workflow:

review the two deep-learning topics in order
run the loop and optimizer examples from repo root
run academy/.venv/bin/python academy/labs/pytorch-training-recipes/src/training_recipe_workflow.py
inspect the outputs before reading the final test result
finish the matching exercises in academy/exercises/pytorch-training-recipes/
keep one short recipe note with the checkpoint rule, curve read, selected recipe, and next change

What To Inspect¶

By the end of the track, the learner should have inspected:

one clean separation between train() and eval()
one validation pass that uses torch.no_grad()
one pair of training and validation curves
one best-validation checkpoint versus final-epoch comparison
one optimizer or regularization change that improved or worsened the validation story

Common Failure Modes¶

validating without eval() or torch.no_grad()
trusting the final epoch because it is the last thing printed
reading the test result before the curves
changing architecture before the current recipe has been understood
treating lower training loss as proof of a better workflow

Exit Standard¶

Before leaving this track, the learner should be able to:

explain which recipe was selected and why
justify the checkpoint choice from validation rather than habit
name the clearest sign of underfitting or overfitting in the curves
say what the next smallest recipe change should be

That is enough to move into ResNet, BERT, and Fine-Tuning without carrying weak training habits forward.