Skip to content

Track 04

PyTorch Training Recipes

This track turns deep-learning mechanics into a stable workflow: clean loops, honest validation, readable curves, checkpoint discipline, and recipe changes that happen before architecture changes.

Primary Goal

Stabilize Training First

Use this track when the model is not the real bottleneck yet and the workflow still needs to become trustworthy.

You Will Practice

Loops, Curves, Checkpoints

train() and eval(), no_grad(), optimizer and regularization choices, and validation-based checkpoint selection.

Exit Rule

One Defensible Recipe Note

You are done when you can say which recipe you trust, why the curves support it, and what the next smallest change should be.

Use This Track When

  • classical evaluation discipline already feels stable
  • you are ready to debug deep-learning workflows without hiding behind architecture changes
  • you want checkpointing, regularization, and curve reading to feel mechanical

What This Track Is Training

This track trains one practical rule:

  • change the recipe cleanly before you change the model creatively

That means the learner should be able to keep these explicit:

  • where training stops and validation begins
  • which checkpoint was selected and why
  • whether the curves show underfitting, overfitting, or unstable optimization
  • whether a regularization or optimizer change improved the validation story

First Session

Use this order:

  1. PyTorch Training Loops
  2. Optimizers and Regularization
  3. run academy/.venv/bin/python academy/examples/deep-learning-recipes/pytorch_training_loop_demo.py
  4. run academy/.venv/bin/python academy/examples/deep-learning-recipes/optimizer_regularization_demo.py
  5. write one sentence on whether the main risk is loop correctness, optimization, or regularization

Full Track Loop

For the complete workflow:

  1. review the two deep-learning topics in order
  2. run the loop and optimizer examples from repo root
  3. run academy/.venv/bin/python academy/labs/pytorch-training-recipes/src/training_recipe_workflow.py
  4. inspect the outputs before reading the final test result
  5. finish the matching exercises in academy/exercises/pytorch-training-recipes/
  6. keep one short recipe note with the checkpoint rule, curve read, selected recipe, and next change

What To Inspect

By the end of the track, the learner should have inspected:

  • one clean separation between train() and eval()
  • one validation pass that uses torch.no_grad()
  • one pair of training and validation curves
  • one best-validation checkpoint versus final-epoch comparison
  • one optimizer or regularization change that improved or worsened the validation story

Common Failure Modes

  • validating without eval() or torch.no_grad()
  • trusting the final epoch because it is the last thing printed
  • reading the test result before the curves
  • changing architecture before the current recipe has been understood
  • treating lower training loss as proof of a better workflow

Exit Standard

Before leaving this track, the learner should be able to:

  • explain which recipe was selected and why
  • justify the checkpoint choice from validation rather than habit
  • name the clearest sign of underfitting or overfitting in the curves
  • say what the next smallest recipe change should be

That is enough to move into ResNet, BERT, and Fine-Tuning without carrying weak training habits forward.