Plotting for Model Debugging¶
What This Is¶
Plotting is not decoration. It is a debugging tool for answering a specific question about the data, the model, or the split.
The most useful plots are the ones that help you decide what to inspect next. In AI Academy, that usually means one of four questions:
- are the values distributed the way I expected
- do two variables move together
- do groups behave differently
- do the mistakes cluster somewhere obvious
When You Use It¶
- checking target balance and feature spread
- comparing group summaries with counts
- looking for overlap, outliers, or strange ranges
- inspecting model errors and residual patterns
- checking whether a feature-target story is real or just noise
Tooling¶
matplotlib.pyplot.subplotsAxes.scatterAxes.histAxes.boxplotAxes.barAxes.axhlineandAxes.axvlineAxes.annotateAxes.legendFigure.tight_layoutFigure.savefigseaborn.histplotseaborn.boxplotseaborn.scatterplotseaborn.heatmap
Why The Object-Oriented API Matters¶
Use plt.subplots() to get a Figure and one or more Axes. That makes it easier to keep one plot focused on one question.
The pattern is:
figure, axis = plt.subplots(figsize=(6, 4))
axis.set_title("Feature versus target")
axis.set_xlabel("feature_name")
axis.set_ylabel("target_rate")
figure.tight_layout()
This matters because the object-oriented style makes it obvious which plot you are editing, especially when you compare several figures in one workflow.
Question 1: What Is The Distribution?¶
The first debugging question is often: what does this feature look like by itself?
Useful functions:
Axes.histfor a quick distribution checkseaborn.histplotwhen you want a slightly cleaner default viewAxes.axvlinewhen you need to mark a threshold or a reference value
Minimal pattern:
figure, axis = plt.subplots(figsize=(6, 4))
axis.hist(df["score"], bins=20)
axis.axvline(df["score"].median(), color="black", linestyle="--", linewidth=1)
axis.set_title("Score distribution")
axis.set_xlabel("score")
axis.set_ylabel("count")
figure.tight_layout()
What to look for:
- extreme skew
- a suspiciously narrow range
- values clipped at a boundary
- a spike at zero, one, or another special value
Common mistake:
- reading a pretty histogram without checking the bin count or the actual numeric range
Question 2: Do Two Variables Move Together?¶
Scatter plots are useful when you want to understand a relationship between two numeric columns.
Useful functions:
Axes.scatterseaborn.scatterplotAxes.annotateAxes.axhlineandAxes.axvline
Applied pattern:
figure, axis = plt.subplots(figsize=(6, 4))
axis.scatter(df["quiz_average"], df["attendance_rate"], alpha=0.6)
axis.axhline(0.5, color="gray", linestyle=":", linewidth=1)
axis.set_xlabel("quiz_average")
axis.set_ylabel("attendance_rate")
axis.set_title("Attendance versus quiz average")
figure.tight_layout()
If one point matters, annotate it:
axis.annotate("outlier", (df.loc[i, "quiz_average"], df.loc[i, "attendance_rate"]))
What to look for:
- clear upward or downward trend
- a sharp boundary near a threshold
- a cluster of outliers
- a curved relationship that a linear model may miss
Common mistake:
- ignoring overplotting when many points share the same region
Quick trick:
- use
alphato reveal overlap - if the plot is still crowded, switch to a binned view or a summary plot
Question 3: How Do Groups Compare?¶
When the data is grouped, a box plot or grouped bar plot often says more than a single average.
Useful functions:
Axes.barAxes.boxplotseaborn.boxplotseaborn.histplotwithhue
Applied pattern:
figure, axis = plt.subplots(figsize=(7, 4))
axis.bar(summary["group"], summary["mean_rate"])
axis.set_title("Mean rate by group")
axis.set_xlabel("group")
axis.set_ylabel("mean_rate")
figure.tight_layout()
If you also care about spread, use a box plot:
figure, axis = plt.subplots(figsize=(7, 4))
axis.boxplot([group_a, group_b, group_c], labels=["A", "B", "C"])
axis.set_title("Feature spread by group")
axis.set_ylabel("feature_value")
figure.tight_layout()
What to look for:
- whether the group mean is supported by enough rows
- whether the spread is wider than the mean difference
- whether one group has much more variability than the others
- whether the grouping variable is actually doing anything
Common mistake:
- comparing group averages without checking counts or spread
Quick trick:
- sort categories before making a bar plot so the eye can compare the ranks quickly
Question 4: Where Are The Model Errors?¶
Debugging plots become much better when they include the model output or the residuals.
Useful functions:
Axes.scatterAxes.histseaborn.heatmapseaborn.scatterplotAxes.axhline
For classification, you might plot predicted score versus actual target:
figure, axis = plt.subplots(figsize=(6, 4))
axis.scatter(df["pred_score"], df["target"], alpha=0.4)
axis.set_xlabel("pred_score")
axis.set_ylabel("target")
axis.set_title("Score versus target")
figure.tight_layout()
For residual-style inspection in regression, a residual plot is often better:
figure, axis = plt.subplots(figsize=(6, 4))
axis.scatter(df["pred"], df["residual"], alpha=0.5)
axis.axhline(0, color="black", linestyle="--", linewidth=1)
axis.set_xlabel("prediction")
axis.set_ylabel("residual")
axis.set_title("Residual check")
figure.tight_layout()
What to look for:
- residuals that fan out as predictions rise
- a cluster of large mistakes in one region
- a group that is systematically overpredicted or underpredicted
Common mistake:
- plotting predictions and targets without understanding which axis is supposed to show the error pattern
Library Notes¶
plt.subplots()is the cleanest starting point because it gives you a figure and axes you can control directly.Axes.scatteris the fastest way to inspect two numeric features and a target signal.Axes.histandseaborn.histplotare the quickest way to inspect shape, skew, and suspicious spikes.Axes.boxplotandseaborn.boxplotare useful when you care about spread, not just the mean.Axes.annotateis useful for naming one or two important points instead of forcing the reader to guess.Figure.tight_layout()helps keep labels readable when you save the figure.Figure.savefig()is the right move when the plot needs to live in a workflow artifact.seaborn.heatmapis helpful when a small matrix or confusion-style table needs a readable visual form.
Failure Pattern¶
Making a polished plot before checking counts, units, or missing values. A good-looking figure can still be based on weak or misleading data.
Other traps:
- plotting a tiny slice as if it were representative
- using the wrong axis labels and then forgetting the units
- comparing categories that are not sorted or not aligned with the grouped table
- reading a scatter plot without noticing severe overlap
- choosing a plot type that hides the question you actually need to answer
The right plot is the one that makes the next debugging step obvious.
Practical Tricks¶
- Start with a table, then plot it. Do not plot first and explain later.
- Use
alphawhen points overlap. - Use a line or threshold marker when the decision boundary matters.
- Use a box plot when spread matters more than the mean.
- Use a heatmap when a matrix is easier to read than a list of numbers.
- Keep one plot focused on one question.
- If the figure needs a long explanation, it is probably trying to answer too many questions at once.
What To Ask Before Trusting The Plot¶
- Did I check the counts before trusting the shape?
- Did I label the axes with real units or just variable names?
- Is the plot answering a question about the data, the split, or the model?
- Would I make the same decision if the plot were less attractive?
- What is the one row or group I should inspect next?
- What would make me choose a different plot type?
Practice¶
- Make one histogram and explain what the shape says about the feature.
- Make one scatter plot with
alphaand explain why overlap matters. - Make one grouped bar plot and include the counts in the same reasoning.
- Make one box plot and explain what it tells you that the mean does not.
- Add one reference line and explain why it helps.
- Annotate one outlier and explain why it deserves attention.
- Compare one Matplotlib plot and one seaborn plot for the same question.
- Write one sentence describing the next row or slice you would inspect after the figure.
Runnable Example¶
Open the matching example in AI Academy and run it from the platform.
Run the same idea in the browser:
Inspect whether the figure supports the same story as the table that came before it.
Quick Checks¶
- If the table and plot disagree, inspect the table first.
- If points overlap too much, use transparency or a different plot type.
- If the plot has no order, sort the categories before plotting.
- If the labels are hard to read, simplify the figure or use tighter layout settings.
- If the figure cannot be explained in one sentence, the debug question is probably too broad.
Longer Connection¶
Continue with Python, NumPy, Pandas, Visualization for a fuller EDA workflow.