Skip to content

MLP Training Baseline

What This Is

An MLP is the smallest useful neural baseline for fixed-width numeric features. This page is not about building a deep architecture zoo. It is about one decision:

  • does a small neural baseline beat the simpler baseline honestly enough to justify extra training complexity

Use an MLP after the split, preprocessing path, and simpler baseline already exist. If those pieces are still shaky, this page is too early.

When You Use It

  • tabular or vectorized features are ready
  • a linear or tree baseline already exists
  • you want to test whether nonlinear interactions add real value
  • you need a controlled first PyTorch training loop before heavier models

Before You Start

You should already have:

  • a fixed train and validation split
  • scaled numeric features or a defensible preprocessing path
  • a simple baseline such as logistic regression or a tree
  • a metric that matches the task

If the simple baseline is missing, add it first.

Baseline Recipe

Start with one hidden layer, one optimizer, and one validation rule:

import torch
import torch.nn as nn
import torch.optim as optim

class SimpleMLP(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, output_dim),
        )

    def forward(self, x):
        return self.net(x)

model = SimpleMLP(input_dim=20, hidden_dim=64, output_dim=3)
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)

That is enough to answer the first question: does a small nonlinear model help on this feature set at all?

What To Inspect Every Run

Do not watch only the final accuracy. Inspect:

  • training loss versus validation loss
  • the best validation checkpoint, not just the last epoch
  • whether the MLP actually beats the simpler baseline
  • whether the gain is broad or only on one slice
  • whether optimization is stable or noisy

If the MLP wins by a tiny amount but adds much more instability, the simpler baseline may still be the better answer.

Decision Ladder

Use the curves to choose the next move:

Pattern Likely story Next move
train and validation both weak features or preprocessing are the bottleneck go back to the data path before enlarging the network
train improves but validation stalls overfitting starts early add regularization, stop earlier, or keep the simpler model
loss is unstable or spikes optimization is too aggressive lower learning rate, add clipping, check feature scale
MLP beats the linear model cleanly nonlinear interactions matter keep the MLP and inspect which slices improved
MLP barely beats the linear model complexity may not be worth it prefer the simpler baseline unless a specific slice improved

Controlled Training Loop

The loop should stay easy to reason about:

for inputs, labels in train_loader:
    optimizer.zero_grad()
    logits = model(inputs)
    loss = criterion(logits, labels)
    loss.backward()
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    optimizer.step()

Keep the first loop plain. Add schedulers, wider networks, or deeper stacks only after the small baseline is behaving sensibly.

Failure Pattern

The biggest failure is treating the MLP as the first serious model instead of a comparison point.

That usually causes three bad habits:

  • no simpler baseline to compare against
  • too many knobs changed at once
  • confusion about whether the problem is the data or the optimizer

Common Mistakes

  • jumping to a large network before a one-hidden-layer baseline
  • forgetting to compare against logistic regression or another simple baseline
  • using unscaled features and blaming the optimizer
  • selecting the last checkpoint instead of the best validation checkpoint
  • trusting one headline score without checking the loss curves
  • widening the network before fixing overfitting or instability

A Good MLP Note

After one run, the learner should be able to say:

  • which simple baseline it was compared against
  • whether the MLP improved validation honestly
  • which checkpoint was selected
  • what the loss curves said about fit quality
  • what change should be tried next, if any

Running This Example

Use the local recipe:

academy/.venv/bin/python academy/examples/deep-learning-recipes/mlp_training_recipe.py

If the environment is not ready yet, use Getting Started.

Practice

  1. Compare the MLP against a simple linear baseline on the same split.
  2. Plot train and validation loss and explain what they imply.
  3. Change dropout or weight decay and describe what moved.
  4. Lower the learning rate and check whether instability improves.
  5. Decide whether the extra complexity is justified on this task.

Longer Connection

Continue with PyTorch Training Loops for the full loop structure, Optimizers and Regularization for the next tuning decisions, and Transfer and Fine-Tuning when the baseline question shifts from "train from scratch" to "reuse a representation."