MLP Training Baseline¶
What This Is¶
An MLP is the smallest useful neural baseline for fixed-width numeric features. This page is not about building a deep architecture zoo. It is about one decision:
- does a small neural baseline beat the simpler baseline honestly enough to justify extra training complexity
Use an MLP after the split, preprocessing path, and simpler baseline already exist. If those pieces are still shaky, this page is too early.
When You Use It¶
- tabular or vectorized features are ready
- a linear or tree baseline already exists
- you want to test whether nonlinear interactions add real value
- you need a controlled first PyTorch training loop before heavier models
Before You Start¶
You should already have:
- a fixed train and validation split
- scaled numeric features or a defensible preprocessing path
- a simple baseline such as logistic regression or a tree
- a metric that matches the task
If the simple baseline is missing, add it first.
Baseline Recipe¶
Start with one hidden layer, one optimizer, and one validation rule:
import torch
import torch.nn as nn
import torch.optim as optim
class SimpleMLP(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(hidden_dim, output_dim),
)
def forward(self, x):
return self.net(x)
model = SimpleMLP(input_dim=20, hidden_dim=64, output_dim=3)
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
That is enough to answer the first question: does a small nonlinear model help on this feature set at all?
What To Inspect Every Run¶
Do not watch only the final accuracy. Inspect:
- training loss versus validation loss
- the best validation checkpoint, not just the last epoch
- whether the MLP actually beats the simpler baseline
- whether the gain is broad or only on one slice
- whether optimization is stable or noisy
If the MLP wins by a tiny amount but adds much more instability, the simpler baseline may still be the better answer.
Decision Ladder¶
Use the curves to choose the next move:
| Pattern | Likely story | Next move |
|---|---|---|
| train and validation both weak | features or preprocessing are the bottleneck | go back to the data path before enlarging the network |
| train improves but validation stalls | overfitting starts early | add regularization, stop earlier, or keep the simpler model |
| loss is unstable or spikes | optimization is too aggressive | lower learning rate, add clipping, check feature scale |
| MLP beats the linear model cleanly | nonlinear interactions matter | keep the MLP and inspect which slices improved |
| MLP barely beats the linear model | complexity may not be worth it | prefer the simpler baseline unless a specific slice improved |
Controlled Training Loop¶
The loop should stay easy to reason about:
for inputs, labels in train_loader:
optimizer.zero_grad()
logits = model(inputs)
loss = criterion(logits, labels)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
Keep the first loop plain. Add schedulers, wider networks, or deeper stacks only after the small baseline is behaving sensibly.
Failure Pattern¶
The biggest failure is treating the MLP as the first serious model instead of a comparison point.
That usually causes three bad habits:
- no simpler baseline to compare against
- too many knobs changed at once
- confusion about whether the problem is the data or the optimizer
Common Mistakes¶
- jumping to a large network before a one-hidden-layer baseline
- forgetting to compare against logistic regression or another simple baseline
- using unscaled features and blaming the optimizer
- selecting the last checkpoint instead of the best validation checkpoint
- trusting one headline score without checking the loss curves
- widening the network before fixing overfitting or instability
A Good MLP Note¶
After one run, the learner should be able to say:
- which simple baseline it was compared against
- whether the MLP improved validation honestly
- which checkpoint was selected
- what the loss curves said about fit quality
- what change should be tried next, if any
Running This Example¶
Use the local recipe:
academy/.venv/bin/python academy/examples/deep-learning-recipes/mlp_training_recipe.py
If the environment is not ready yet, use Getting Started.
Practice¶
- Compare the MLP against a simple linear baseline on the same split.
- Plot train and validation loss and explain what they imply.
- Change dropout or weight decay and describe what moved.
- Lower the learning rate and check whether instability improves.
- Decide whether the extra complexity is justified on this task.
Longer Connection¶
Continue with PyTorch Training Loops for the full loop structure, Optimizers and Regularization for the next tuning decisions, and Transfer and Fine-Tuning when the baseline question shifts from "train from scratch" to "reuse a representation."