Transfer and Fine-Tuning¶

What This Is¶

This page is about one adaptation decision:

The right answer depends on label count, domain gap, and how much evidence you have that reuse is already working.

Use the smallest adaptation that still works:

This order keeps reuse honest and reduces overfitting risk.

Situation	Better first move	Main warning
few labels, low domain gap	frozen backbone plus head	full fine-tuning may memorize fast
moderate labels, moderate gap	partial unfreeze	backbone LR can still be too high
large labeled set, close domain	full fine-tuning may be justified	still compare against a simpler reuse baseline
few labels, large gap	test frozen reuse and scratch baseline first	negative transfer is real

for parameter in backbone.parameters():
    parameter.requires_grad = False

That is the first honest reuse check. If the frozen representation already works, deeper adaptation may not be necessary.

Before unfreezing more of the model, inspect:

If the frozen result is already competitive, escalation needs a strong reason.

The common failure is treating full fine-tuning as the default.

That usually hides one of these realities:

skipping the frozen baseline
changing unfreeze depth, learning rate, and preprocessing at the same time
using the same LR for head and backbone without justification
calling a tiny validation gain proof that full fine-tuning is worth the extra risk
ignoring negative transfer when scratch or frozen reuse is already stronger

After one run, the learner should be able to say:

Continue with ResNet, BERT, and Fine-Tuning for the full track, Representation Reuse and Embedding Transfer for frozen reuse workflows, and Learning Rate Schedulers when the adaptation recipe is the main bottleneck.