Transfer and Fine-Tuning¶
What This Is¶
This page is about one adaptation decision:
- should you keep the pretrained backbone frozen
- partially unfreeze it
- use a smaller adaptation surface
- or fine-tune the whole model
The right answer depends on label count, domain gap, and how much evidence you have that reuse is already working.
When You Use It¶
- labeled data is limited
- a strong pretrained backbone exists
- training from scratch is expensive or unstable
- you need to compare frozen reuse against deeper adaptation
Start With The Adaptation Ladder¶
Use the smallest adaptation that still works:
- frozen backbone plus new head
- smaller adaptation surface such as PEFT or adapter-style update
- partial unfreeze of later blocks
- full fine-tuning only if the simpler steps fail clearly
This order keeps reuse honest and reduces overfitting risk.
Fast Decision Table¶
| Situation | Better first move | Main warning |
|---|---|---|
| few labels, low domain gap | frozen backbone plus head | full fine-tuning may memorize fast |
| moderate labels, moderate gap | partial unfreeze | backbone LR can still be too high |
| large labeled set, close domain | full fine-tuning may be justified | still compare against a simpler reuse baseline |
| few labels, large gap | test frozen reuse and scratch baseline first | negative transfer is real |
Minimal Freeze Pattern¶
for parameter in backbone.parameters():
parameter.requires_grad = False
That is the first honest reuse check. If the frozen representation already works, deeper adaptation may not be necessary.
What To Inspect First¶
Before unfreezing more of the model, inspect:
- the frozen baseline
- the validation gap against a scratch baseline
- whether the domain-specific slice still fails
- whether the backbone is moving too aggressively
If the frozen result is already competitive, escalation needs a strong reason.
Failure Pattern¶
The common failure is treating full fine-tuning as the default.
That usually hides one of these realities:
- the frozen representation was already enough
- the label count is too small for full adaptation
- the domain gap is large enough that naive transfer hurts
- the training recipe is unstable, not the adaptation depth
Common Mistakes¶
- skipping the frozen baseline
- changing unfreeze depth, learning rate, and preprocessing at the same time
- using the same LR for head and backbone without justification
- calling a tiny validation gain proof that full fine-tuning is worth the extra risk
- ignoring negative transfer when scratch or frozen reuse is already stronger
A Good Transfer Note¶
After one run, the learner should be able to say:
- what was frozen
- what was trainable
- why that depth was chosen
- what the validation result said compared with simpler reuse
- what evidence would justify unfreezing more
Practice¶
- Compare a frozen backbone against a partially unfrozen version.
- Use separate learning rates for head and backbone and explain why.
- Decide when PEFT or smaller adaptation is safer than full unfreeze.
- Name one sign of negative transfer.
Longer Connection¶
Continue with ResNet, BERT, and Fine-Tuning for the full track, Representation Reuse and Embedding Transfer for frozen reuse workflows, and Learning Rate Schedulers when the adaptation recipe is the main bottleneck.