Steering Frozen Generative Models¶
When the weights of a generator are frozen and you cannot fine-tune, you can still make it produce the output you want — by editing the inputs it sees. Text prompts, text embeddings, and initial latents are all knobs. Learning small edits on those knobs is the technique.
What This Is¶
Given a frozen generator G(text_embedding, latent_z) → image (or text, or audio), you want to solve:
find δ_text, δ_z such that G(text_emb + δ_text, z + δ_z) matches some objective
subject to ||δ_text|| <= ε_text, ||δ_z|| <= ε_z
No weight in G is updated. Only the deltas are learned. The deltas can be:
- a single token embedding added to the tokenizer (textual inversion, the "new concept" case)
- a prompt embedding edit that shifts the generation toward a target concept
- an initial-latent edit that primes the sampling path
- a classifier-free guidance scale that amplifies or dampens the conditioning signal
Three named techniques you should recognize:
- Textual Inversion (Gal 2022). Learn one new token embedding
v*by optimizingE[ ||G("a photo of v*", z) - x_target||² ]over a small set of target images. At inference you just usev*in prompts. - Prompt-to-Prompt / Embedding Arithmetic. Find a direction in text-embedding space that corresponds to adding a concept (e.g.,
emb("with fire hydrant") - emb("")averaged over contexts). Add a scaled version of that direction at inference. - Classifier-Free Guidance tuning. Rerun sampling with different guidance scales; higher scales push harder toward the conditioning; too high produces artifacts. One scalar, often the cheapest knob.
When You Use It¶
- you must not change the generator's weights (task constraint, license constraint, or compute constraint)
- you need the generator to produce a specific concept, style, or entity not in its training set
- you need the generator to combine two concepts in a way the text encoder does not naturally bridge
- the edit budget is small — you can learn one token embedding or a handful of scalars, not a full LoRA
Do Not Use It When¶
- you have permission and compute to fine-tune the generator (full or LoRA) — that is usually faster and more reliable
- the model has a safety filter that rejects your target — you cannot steer around a server-side filter with client-side edits
- the objective is a hard constraint (e.g., "output must contain a legally specific string") — generative steering is a soft push, not a guarantee
Textual Inversion In Practice¶
You want the generator to learn a new concept v* from a handful of target images x_1, ..., x_K.
v = nn.Parameter(randn(embedding_dim)) # one new token embedding
for step in range(N):
x = random.choice(targets)
t = random_timestep()
noise = randn_like(x)
x_noisy = q_sample(x, t, noise)
pred_noise = G.unet(x_noisy, t, text_embedding_with(v))
loss = ||pred_noise - noise||²
loss.backward()
v.data -= lr * v.grad
The UNet is frozen. Only v is updated. Typical budget: 3-10 target images, a few thousand steps, and you end up with one token you can drop into any prompt.
Embedding-Arithmetic Steering In Practice¶
When you do not have target images, but you know the concept as text, you can compute a direction:
base_contexts = ["a photo of a cow", "a photo of a cat", ...]
target_contexts = ["a photo of a cow with a fire hydrant", "a photo of a cat with a fire hydrant", ...]
direction = mean([text_encoder(t) - text_encoder(b) for b, t in zip(base_contexts, target_contexts)])
# at inference:
edited_embedding = text_encoder(user_prompt) + α * direction
The scalar α is the knob — α = 0 leaves the prompt unchanged, α too high produces artifacts where the concept dominates.
Initial-Latent Priming¶
Diffusion samples start from Gaussian noise z_T. You can pick a z_T that, run forward with a fixed prompt, tends to produce the concept you want:
- generate many samples, rank by a concept classifier, keep the latents behind the highest-scoring samples
- average or interpolate between them to get a "primed" latent
- at inference, start from a noisy version of this primed latent instead of pure noise
This is less principled but often effective when the budget on other deltas is tight.
What To Inspect¶
- sample grid — generate a 4×4 grid of samples with and without the edit; visually confirm the concept shows up
- concept-classifier score distribution — if you have an external classifier that detects the target concept, score samples with and without the edit; look at the full distribution, not just the mean
- unintended changes — does the edit also break unrelated prompts? A "fire hydrant direction" should not also change "a photo of a sunset"
- magnitude sweep — generate samples at α ∈ {0.25, 0.5, 1.0, 2.0, 4.0} and find the sweet spot
- training-image overfit (for textual inversion) — does
v*reproduce the training images verbatim? That means you overfit and lost generality
Failure Pattern¶
- embedding explosion. Direction magnitudes grow without bound because there is no weight decay. Sample quality degrades. Fix: L2-regularize the delta or clip its norm.
- concept leakage. The edit is "too strong" — every sample now has a fire hydrant, even prompts where that is absurd. Fix: reduce α, or mix:
α * directiononly when the user prompt semantically allows it. - textual-inversion overfit. With too few training images or too many steps,
v*memorizes exact training images. Fix: fewer steps, more diverse augmentations, or a smaller token-embedding learning rate. - classifier-free guidance clipping. Very high CFG scales (>15) produce oversaturated, artifacted images. The visual effect is "burned" colors and unnatural contrast. Fix: keep CFG in the 5-12 range.
- evaluating with the wrong metric. "Does the concept appear?" is what the task wants, but you measure FID or CLIP-score. Those do not directly capture concept presence. Use a task-specific classifier or the exact evaluator the task uses.
Quick Checks¶
- is the generator truly frozen? (all parameters with requires_grad=False; only the delta has requires_grad=True)
- is the delta magnitude bounded? (norm clip or L2 regularization in the loss)
- are you evaluating on the same classifier/metric the task uses, not a proxy?
- did you compare against the no-edit baseline? ("my edit works" means "my edit works compared to no edit with the same prompts")
- have you checked unrelated prompts still look normal?
Practice¶
Run academy/labs/generative-model-steering/src/steering_workflow.py:
- trains a small conditional pixel generator (toy VAE decoder, not a real diffusion model — just enough to demonstrate the mechanics) on a synthetic shape dataset
- freezes the generator and learns a textual-inversion-style new token embedding that makes it produce a held-out shape class it was not trained to generate
- adds an embedding-arithmetic experiment: compute a "size → large" direction and demonstrate it transfers
- demonstrates the "LLM-judge-in-the-loop" black-box search pattern: given a frozen scorer and a candidate generator output, search over prompts with a small beam to maximize the scorer
You should leave able to explain when frozen-model steering is the right move, what textual inversion optimizes, and why the same search pattern ("optimize inputs against a frozen evaluator") applies whether the evaluator is a classifier, a generator, or an LLM judge.
Longer Connection¶
"Optimize inputs against a frozen model" is a design pattern with many instances:
- textual inversion — learn an input token to match a target image distribution
- adversarial examples — learn an input perturbation to flip a classifier
- prompt engineering by search — search over prompts to maximize an external judge's score (see Black-Box LLM Optimization)
- adversarial patch attacks — learn a fixed image patch that, when overlaid, triggers a target class
The family of techniques is sometimes called input-space optimization or activation steering. They are powerful precisely because they do not require retraining the target model — which means they apply to cases where retraining is impossible (API-only models, licensed weights, compute-limited settings). They are also the natural way to engage with any "the model is frozen, here is a budget for deltas on the inputs" task.