Skip to content

Transfer and Fine-Tuning

What This Is

This page is about one adaptation decision:

  • should you keep the pretrained backbone frozen
  • partially unfreeze it
  • use a smaller adaptation surface
  • or fine-tune the whole model

The right answer depends on label count, domain gap, and how much evidence you have that reuse is already working.

When You Use It

  • labeled data is limited
  • a strong pretrained backbone exists
  • training from scratch is expensive or unstable
  • you need to compare frozen reuse against deeper adaptation

Start With The Adaptation Ladder

Use the smallest adaptation that still works:

  1. frozen backbone plus new head
  2. smaller adaptation surface such as PEFT or adapter-style update
  3. partial unfreeze of later blocks
  4. full fine-tuning only if the simpler steps fail clearly

This order keeps reuse honest and reduces overfitting risk.

Fast Decision Table

Situation Better first move Main warning
few labels, low domain gap frozen backbone plus head full fine-tuning may memorize fast
moderate labels, moderate gap partial unfreeze backbone LR can still be too high
large labeled set, close domain full fine-tuning may be justified still compare against a simpler reuse baseline
few labels, large gap test frozen reuse and scratch baseline first negative transfer is real

Minimal Freeze Pattern

for parameter in backbone.parameters():
    parameter.requires_grad = False

That is the first honest reuse check. If the frozen representation already works, deeper adaptation may not be necessary.

What To Inspect First

Before unfreezing more of the model, inspect:

  • the frozen baseline
  • the validation gap against a scratch baseline
  • whether the domain-specific slice still fails
  • whether the backbone is moving too aggressively

If the frozen result is already competitive, escalation needs a strong reason.

Failure Pattern

The common failure is treating full fine-tuning as the default.

That usually hides one of these realities:

  • the frozen representation was already enough
  • the label count is too small for full adaptation
  • the domain gap is large enough that naive transfer hurts
  • the training recipe is unstable, not the adaptation depth

Common Mistakes

  • skipping the frozen baseline
  • changing unfreeze depth, learning rate, and preprocessing at the same time
  • using the same LR for head and backbone without justification
  • calling a tiny validation gain proof that full fine-tuning is worth the extra risk
  • ignoring negative transfer when scratch or frozen reuse is already stronger

A Good Transfer Note

After one run, the learner should be able to say:

  • what was frozen
  • what was trainable
  • why that depth was chosen
  • what the validation result said compared with simpler reuse
  • what evidence would justify unfreezing more

Practice

  1. Compare a frozen backbone against a partially unfrozen version.
  2. Use separate learning rates for head and backbone and explain why.
  3. Decide when PEFT or smaller adaptation is safer than full unfreeze.
  4. Name one sign of negative transfer.

Longer Connection

Continue with ResNet, BERT, and Fine-Tuning for the full track, Representation Reuse and Embedding Transfer for frozen reuse workflows, and Learning Rate Schedulers when the adaptation recipe is the main bottleneck.