MLP Training Baseline¶

What This Is¶

An MLP is the smallest useful neural baseline for fixed-width numeric features. This page is not about building a deep architecture zoo. It is about one decision:

does a small neural baseline beat the simpler baseline honestly enough to justify extra training complexity

Use an MLP after the split, preprocessing path, and simpler baseline already exist. If those pieces are still shaky, this page is too early.

When You Use It¶

tabular or vectorized features are ready
a linear or tree baseline already exists
you want to test whether nonlinear interactions add real value
you need a controlled first PyTorch training loop before heavier models

Before You Start¶

You should already have:

a fixed train and validation split
scaled numeric features or a defensible preprocessing path
a simple baseline such as logistic regression or a tree
a metric that matches the task

If the simple baseline is missing, add it first.

Baseline Recipe¶

Start with one hidden layer, one optimizer, and one validation rule:

import torch
import torch.nn as nn
import torch.optim as optim

class SimpleMLP(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, output_dim),
        )

    def forward(self, x):
        return self.net(x)

model = SimpleMLP(input_dim=20, hidden_dim=64, output_dim=3)
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)

That is enough to answer the first question: does a small nonlinear model help on this feature set at all?

What To Inspect Every Run¶

Do not watch only the final accuracy. Inspect:

training loss versus validation loss
the best validation checkpoint, not just the last epoch
whether the MLP actually beats the simpler baseline
whether the gain is broad or only on one slice
whether optimization is stable or noisy

If the MLP wins by a tiny amount but adds much more instability, the simpler baseline may still be the better answer.

Decision Ladder¶

Use the curves to choose the next move:

Pattern	Likely story	Next move
train and validation both weak	features or preprocessing are the bottleneck	go back to the data path before enlarging the network
train improves but validation stalls	overfitting starts early	add regularization, stop earlier, or keep the simpler model
loss is unstable or spikes	optimization is too aggressive	lower learning rate, add clipping, check feature scale
MLP beats the linear model cleanly	nonlinear interactions matter	keep the MLP and inspect which slices improved
MLP barely beats the linear model	complexity may not be worth it	prefer the simpler baseline unless a specific slice improved

Controlled Training Loop¶

The loop should stay easy to reason about:

for inputs, labels in train_loader:
    optimizer.zero_grad()
    logits = model(inputs)
    loss = criterion(logits, labels)
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    optimizer.step()

Keep the first loop plain. Add schedulers, wider networks, or deeper stacks only after the small baseline is behaving sensibly.

Failure Pattern¶

The biggest failure is treating the MLP as the first serious model instead of a comparison point.

That usually causes three bad habits:

no simpler baseline to compare against
too many knobs changed at once
confusion about whether the problem is the data or the optimizer

Common Mistakes¶

jumping to a large network before a one-hidden-layer baseline
forgetting to compare against logistic regression or another simple baseline
using unscaled features and blaming the optimizer
selecting the last checkpoint instead of the best validation checkpoint
trusting one headline score without checking the loss curves
widening the network before fixing overfitting or instability

A Good MLP Note¶

After one run, the learner should be able to say:

which simple baseline it was compared against
whether the MLP improved validation honestly
which checkpoint was selected
what the loss curves said about fit quality
what change should be tried next, if any

Running This Example¶

Use the local recipe:

academy/.venv/bin/python academy/examples/deep-learning-recipes/mlp_training_recipe.py

If the environment is not ready yet, use Getting Started.

Practice¶

Compare the MLP against a simple linear baseline on the same split.
Plot train and validation loss and explain what they imply.
Change dropout or weight decay and describe what moved.
Lower the learning rate and check whether instability improves.
Decide whether the extra complexity is justified on this task.

Longer Connection¶

Continue with PyTorch Training Loops for the full loop structure, Optimizers and Regularization for the next tuning decisions, and Transfer and Fine-Tuning when the baseline question shifts from "train from scratch" to "reuse a representation."