Plotting for Model Debugging¶

What This Is¶

Plotting is not decoration. It is a debugging tool for answering a specific question about the data, the model, or the split.

The most useful plots are the ones that help you decide what to inspect next. In AI Academy, that usually means one of four questions:

are the values distributed the way I expected
do two variables move together
do groups behave differently
do the mistakes cluster somewhere obvious

When You Use It¶

checking target balance and feature spread
comparing group summaries with counts
looking for overlap, outliers, or strange ranges
inspecting model errors and residual patterns
checking whether a feature-target story is real or just noise

Tooling¶

matplotlib.pyplot.subplots
Axes.scatter
Axes.hist
Axes.boxplot
Axes.bar
Axes.axhline and Axes.axvline
Axes.annotate
Axes.legend
Figure.tight_layout
Figure.savefig
seaborn.histplot
seaborn.boxplot
seaborn.scatterplot
seaborn.heatmap

Why The Object-Oriented API Matters¶

Use plt.subplots() to get a Figure and one or more Axes. That makes it easier to keep one plot focused on one question.

The pattern is:

figure, axis = plt.subplots(figsize=(6, 4))
axis.set_title("Feature versus target")
axis.set_xlabel("feature_name")
axis.set_ylabel("target_rate")
figure.tight_layout()

This matters because the object-oriented style makes it obvious which plot you are editing, especially when you compare several figures in one workflow.

Question 1: What Is The Distribution?¶

The first debugging question is often: what does this feature look like by itself?

Useful functions:

Axes.hist for a quick distribution check
seaborn.histplot when you want a slightly cleaner default view
Axes.axvline when you need to mark a threshold or a reference value

Minimal pattern:

figure, axis = plt.subplots(figsize=(6, 4))
axis.hist(df["score"], bins=20)
axis.axvline(df["score"].median(), color="black", linestyle="--", linewidth=1)
axis.set_title("Score distribution")
axis.set_xlabel("score")
axis.set_ylabel("count")
figure.tight_layout()

What to look for:

extreme skew
a suspiciously narrow range
values clipped at a boundary
a spike at zero, one, or another special value

Common mistake:

reading a pretty histogram without checking the bin count or the actual numeric range

Question 2: Do Two Variables Move Together?¶

Scatter plots are useful when you want to understand a relationship between two numeric columns.

Useful functions:

Axes.scatter
seaborn.scatterplot
Axes.annotate
Axes.axhline and Axes.axvline

Applied pattern:

figure, axis = plt.subplots(figsize=(6, 4))
axis.scatter(df["quiz_average"], df["attendance_rate"], alpha=0.6)
axis.axhline(0.5, color="gray", linestyle=":", linewidth=1)
axis.set_xlabel("quiz_average")
axis.set_ylabel("attendance_rate")
axis.set_title("Attendance versus quiz average")
figure.tight_layout()

If one point matters, annotate it:

axis.annotate("outlier", (df.loc[i, "quiz_average"], df.loc[i, "attendance_rate"]))

What to look for:

clear upward or downward trend
a sharp boundary near a threshold
a cluster of outliers
a curved relationship that a linear model may miss

Common mistake:

ignoring overplotting when many points share the same region

Quick trick:

use alpha to reveal overlap
if the plot is still crowded, switch to a binned view or a summary plot

Question 3: How Do Groups Compare?¶

When the data is grouped, a box plot or grouped bar plot often says more than a single average.

Useful functions:

Axes.bar
Axes.boxplot
seaborn.boxplot
seaborn.histplot with hue

Applied pattern:

figure, axis = plt.subplots(figsize=(7, 4))
axis.bar(summary["group"], summary["mean_rate"])
axis.set_title("Mean rate by group")
axis.set_xlabel("group")
axis.set_ylabel("mean_rate")
figure.tight_layout()

If you also care about spread, use a box plot:

figure, axis = plt.subplots(figsize=(7, 4))
axis.boxplot([group_a, group_b, group_c], labels=["A", "B", "C"])
axis.set_title("Feature spread by group")
axis.set_ylabel("feature_value")
figure.tight_layout()

What to look for:

whether the group mean is supported by enough rows
whether the spread is wider than the mean difference
whether one group has much more variability than the others
whether the grouping variable is actually doing anything

Common mistake:

comparing group averages without checking counts or spread

Quick trick:

sort categories before making a bar plot so the eye can compare the ranks quickly

Question 4: Where Are The Model Errors?¶

Debugging plots become much better when they include the model output or the residuals.

Useful functions:

Axes.scatter
Axes.hist
seaborn.heatmap
seaborn.scatterplot
Axes.axhline

For classification, you might plot predicted score versus actual target:

figure, axis = plt.subplots(figsize=(6, 4))
axis.scatter(df["pred_score"], df["target"], alpha=0.4)
axis.set_xlabel("pred_score")
axis.set_ylabel("target")
axis.set_title("Score versus target")
figure.tight_layout()

For residual-style inspection in regression, a residual plot is often better:

figure, axis = plt.subplots(figsize=(6, 4))
axis.scatter(df["pred"], df["residual"], alpha=0.5)
axis.axhline(0, color="black", linestyle="--", linewidth=1)
axis.set_xlabel("prediction")
axis.set_ylabel("residual")
axis.set_title("Residual check")
figure.tight_layout()

What to look for:

residuals that fan out as predictions rise
a cluster of large mistakes in one region
a group that is systematically overpredicted or underpredicted

Common mistake:

plotting predictions and targets without understanding which axis is supposed to show the error pattern

Library Notes¶

plt.subplots() is the cleanest starting point because it gives you a figure and axes you can control directly.
Axes.scatter is the fastest way to inspect two numeric features and a target signal.
Axes.hist and seaborn.histplot are the quickest way to inspect shape, skew, and suspicious spikes.
Axes.boxplot and seaborn.boxplot are useful when you care about spread, not just the mean.
Axes.annotate is useful for naming one or two important points instead of forcing the reader to guess.
Figure.tight_layout() helps keep labels readable when you save the figure.
Figure.savefig() is the right move when the plot needs to live in a workflow artifact.
seaborn.heatmap is helpful when a small matrix or confusion-style table needs a readable visual form.

Failure Pattern¶

Making a polished plot before checking counts, units, or missing values. A good-looking figure can still be based on weak or misleading data.

Other traps:

plotting a tiny slice as if it were representative
using the wrong axis labels and then forgetting the units
comparing categories that are not sorted or not aligned with the grouped table
reading a scatter plot without noticing severe overlap
choosing a plot type that hides the question you actually need to answer

The right plot is the one that makes the next debugging step obvious.

Practical Tricks¶

Start with a table, then plot it. Do not plot first and explain later.
Use alpha when points overlap.
Use a line or threshold marker when the decision boundary matters.
Use a box plot when spread matters more than the mean.
Use a heatmap when a matrix is easier to read than a list of numbers.
Keep one plot focused on one question.
If the figure needs a long explanation, it is probably trying to answer too many questions at once.

What To Ask Before Trusting The Plot¶

Did I check the counts before trusting the shape?
Did I label the axes with real units or just variable names?
Is the plot answering a question about the data, the split, or the model?
Would I make the same decision if the plot were less attractive?
What is the one row or group I should inspect next?
What would make me choose a different plot type?

Practice¶

Make one histogram and explain what the shape says about the feature.
Make one scatter plot with alpha and explain why overlap matters.
Make one grouped bar plot and include the counts in the same reasoning.
Make one box plot and explain what it tells you that the mean does not.
Add one reference line and explain why it helps.
Annotate one outlier and explain why it deserves attention.
Compare one Matplotlib plot and one seaborn plot for the same question.
Write one sentence describing the next row or slice you would inspect after the figure.

Runnable Example¶

Open the matching example in AI Academy and run it from the platform.

Run the same idea in the browser:

Inspect whether the figure supports the same story as the table that came before it.

Quick Checks¶

If the table and plot disagree, inspect the table first.
If points overlap too much, use transparency or a different plot type.
If the plot has no order, sort the categories before plotting.
If the labels are hard to read, simplify the figure or use tighter layout settings.
If the figure cannot be explained in one sentence, the debug question is probably too broad.

Longer Connection¶

Continue with Python, NumPy, Pandas, Visualization for a fuller EDA workflow.