Steering Frozen Generative Models¶

When the weights of a generator are frozen and you cannot fine-tune, you can still make it produce the output you want — by editing the inputs it sees. Text prompts, text embeddings, and initial latents are all knobs. Learning small edits on those knobs is the technique.

What This Is¶

Given a frozen generator G(text_embedding, latent_z) → image (or text, or audio), you want to solve:

find   δ_text, δ_z    such that   G(text_emb + δ_text, z + δ_z)  matches some objective
subject to            ||δ_text|| <= ε_text,  ||δ_z|| <= ε_z

No weight in G is updated. Only the deltas are learned. The deltas can be:

a single token embedding added to the tokenizer (textual inversion, the "new concept" case)
a prompt embedding edit that shifts the generation toward a target concept
an initial-latent edit that primes the sampling path
a classifier-free guidance scale that amplifies or dampens the conditioning signal

Three named techniques you should recognize:

Textual Inversion (Gal 2022). Learn one new token embedding v* by optimizing E[ ||G("a photo of v*", z) - x_target||² ] over a small set of target images. At inference you just use v* in prompts.
Prompt-to-Prompt / Embedding Arithmetic. Find a direction in text-embedding space that corresponds to adding a concept (e.g., emb("with fire hydrant") - emb("") averaged over contexts). Add a scaled version of that direction at inference.
Classifier-Free Guidance tuning. Rerun sampling with different guidance scales; higher scales push harder toward the conditioning; too high produces artifacts. One scalar, often the cheapest knob.

When You Use It¶

you must not change the generator's weights (task constraint, license constraint, or compute constraint)
you need the generator to produce a specific concept, style, or entity not in its training set
you need the generator to combine two concepts in a way the text encoder does not naturally bridge
the edit budget is small — you can learn one token embedding or a handful of scalars, not a full LoRA

Do Not Use It When¶

you have permission and compute to fine-tune the generator (full or LoRA) — that is usually faster and more reliable
the model has a safety filter that rejects your target — you cannot steer around a server-side filter with client-side edits
the objective is a hard constraint (e.g., "output must contain a legally specific string") — generative steering is a soft push, not a guarantee

Textual Inversion In Practice¶

You want the generator to learn a new concept v* from a handful of target images x_1, ..., x_K.

v = nn.Parameter(randn(embedding_dim))   # one new token embedding
for step in range(N):
    x = random.choice(targets)
    t = random_timestep()
    noise = randn_like(x)
    x_noisy = q_sample(x, t, noise)
    pred_noise = G.unet(x_noisy, t, text_embedding_with(v))
    loss = ||pred_noise - noise||²
    loss.backward()
    v.data -= lr * v.grad

The UNet is frozen. Only v is updated. Typical budget: 3-10 target images, a few thousand steps, and you end up with one token you can drop into any prompt.

Embedding-Arithmetic Steering In Practice¶

When you do not have target images, but you know the concept as text, you can compute a direction:

base_contexts = ["a photo of a cow", "a photo of a cat", ...]
target_contexts = ["a photo of a cow with a fire hydrant", "a photo of a cat with a fire hydrant", ...]
direction = mean([text_encoder(t) - text_encoder(b) for b, t in zip(base_contexts, target_contexts)])

# at inference:
edited_embedding = text_encoder(user_prompt) + α * direction

The scalar α is the knob — α = 0 leaves the prompt unchanged, α too high produces artifacts where the concept dominates.

Initial-Latent Priming¶

Diffusion samples start from Gaussian noise z_T. You can pick a z_T that, run forward with a fixed prompt, tends to produce the concept you want:

generate many samples, rank by a concept classifier, keep the latents behind the highest-scoring samples
average or interpolate between them to get a "primed" latent
at inference, start from a noisy version of this primed latent instead of pure noise

This is less principled but often effective when the budget on other deltas is tight.

What To Inspect¶

sample grid — generate a 4×4 grid of samples with and without the edit; visually confirm the concept shows up
concept-classifier score distribution — if you have an external classifier that detects the target concept, score samples with and without the edit; look at the full distribution, not just the mean
unintended changes — does the edit also break unrelated prompts? A "fire hydrant direction" should not also change "a photo of a sunset"
magnitude sweep — generate samples at α ∈ {0.25, 0.5, 1.0, 2.0, 4.0} and find the sweet spot
training-image overfit (for textual inversion) — does v* reproduce the training images verbatim? That means you overfit and lost generality

Failure Pattern¶

embedding explosion. Direction magnitudes grow without bound because there is no weight decay. Sample quality degrades. Fix: L2-regularize the delta or clip its norm.
concept leakage. The edit is "too strong" — every sample now has a fire hydrant, even prompts where that is absurd. Fix: reduce α, or mix: α * direction only when the user prompt semantically allows it.
textual-inversion overfit. With too few training images or too many steps, v* memorizes exact training images. Fix: fewer steps, more diverse augmentations, or a smaller token-embedding learning rate.
classifier-free guidance clipping. Very high CFG scales (>15) produce oversaturated, artifacted images. The visual effect is "burned" colors and unnatural contrast. Fix: keep CFG in the 5-12 range.
evaluating with the wrong metric. "Does the concept appear?" is what the task wants, but you measure FID or CLIP-score. Those do not directly capture concept presence. Use a task-specific classifier or the exact evaluator the task uses.

Quick Checks¶

is the generator truly frozen? (all parameters with requires_grad=False; only the delta has requires_grad=True)
is the delta magnitude bounded? (norm clip or L2 regularization in the loss)
are you evaluating on the same classifier/metric the task uses, not a proxy?
did you compare against the no-edit baseline? ("my edit works" means "my edit works compared to no edit with the same prompts")
have you checked unrelated prompts still look normal?

Practice¶

Run academy/labs/generative-model-steering/src/steering_workflow.py:

trains a small conditional pixel generator (toy VAE decoder, not a real diffusion model — just enough to demonstrate the mechanics) on a synthetic shape dataset
freezes the generator and learns a textual-inversion-style new token embedding that makes it produce a held-out shape class it was not trained to generate
adds an embedding-arithmetic experiment: compute a "size → large" direction and demonstrate it transfers
demonstrates the "LLM-judge-in-the-loop" black-box search pattern: given a frozen scorer and a candidate generator output, search over prompts with a small beam to maximize the scorer

You should leave able to explain when frozen-model steering is the right move, what textual inversion optimizes, and why the same search pattern ("optimize inputs against a frozen evaluator") applies whether the evaluator is a classifier, a generator, or an LLM judge.

Longer Connection¶

"Optimize inputs against a frozen model" is a design pattern with many instances:

textual inversion — learn an input token to match a target image distribution
adversarial examples — learn an input perturbation to flip a classifier
prompt engineering by search — search over prompts to maximize an external judge's score (see Black-Box LLM Optimization)
adversarial patch attacks — learn a fixed image patch that, when overlaid, triggers a target class

The family of techniques is sometimes called input-space optimization or activation steering. They are powerful precisely because they do not require retraining the target model — which means they apply to cases where retraining is impossible (API-only models, licensed weights, compute-limited settings). They are also the natural way to engage with any "the model is frozen, here is a budget for deltas on the inputs" task.