Grouped Summaries and Slice Checks¶

What This Is¶

Grouped summaries help you move beyond one overall average. They show whether a model or dataset behaves differently across categories or slices, and they are one of the fastest ways to find hidden imbalance.

The main job is not to build a fancy table. The main job is to answer a simple question: which slice is actually different, and is it large enough to care about?

When You Use It¶

checking target rates across categories
finding suspicious subgroups
comparing aggregates with slice-level behavior
comparing more than one metric for the same slice
deciding whether a subgroup is large enough to trust
turning a raw table into a reportable summary

Tooling¶

groupby
agg
sort_values
reset_index
named aggregation
size
count
nunique
pivot_table
crosstab
transform
filter
value_counts

Most Useful Functions¶

groupby is the starting point. It splits the table by a key or keys and gives you a grouped object that can summarize each slice.

agg is the workhorse. It lets you compute means, counts, unique counts, and other summaries in one pass. Named aggregation is usually the cleanest form when you want readable column names.

size counts rows in each group, including missing values in the measured columns. count counts non-null values in a specific column. That difference matters when missingness is part of the story.

nunique tells you how many distinct values appear inside each group. It is useful when you want to know whether a category is repetitive or varied.

value_counts is the fastest way to inspect how common a category is. On a grouped Series, it can show the distribution inside each group.

pivot_table and crosstab are useful when you want a two-way summary that reads more like a report than a raw groupby result.

transform attaches a group statistic back to every row. That is useful when you want a row-level comparison against the slice mean.

filter keeps or drops whole groups based on a rule. That is useful when a slice is too small to trust.

Minimal Example¶

summary = df.groupby("channel", dropna=False).agg(
    review_rate=("needs_human_review", "mean"),
    n=("needs_human_review", "size"),
)

Worked Pattern¶

slice_summary = (
    df.groupby(["channel", "issue_category"], as_index=False, dropna=False)
    .agg(
        review_rate=("needs_human_review", "mean"),
        n=("needs_human_review", "size"),
        unique_agents=("agent_id", "nunique"),
    )
    .sort_values(["review_rate", "n"], ascending=[False, False])
)

That pattern is useful because it gives you:

the rate you want to inspect
the slice size you need to trust it
a uniqueness check that can reveal whether a group is dominated by the same few entities

If you want a two-way report, pivot_table and crosstab are often easier to read:

rate_table = pd.pivot_table(
    df,
    values="needs_human_review",
    index="channel",
    columns="issue_category",
    aggfunc="mean",
    margins=True,
)

count_table = pd.crosstab(
    df["channel"],
    df["issue_category"],
    margins=True,
)

Use pivot_table when you want an aggregated value in a matrix layout. Use crosstab when you want a cross-tab frequency table or a compact two-way comparison.

Leave unobserved combinations as NaN when the table is showing rates. That means “no rows for this slice,” not “observed zero rate.”

transform is the row-level companion to group summaries:

df["channel_review_rate"] = df.groupby("channel")["needs_human_review"].transform("mean")
df["above_channel_average"] = df["needs_human_review"] > df["channel_review_rate"]

That pattern is useful when you want to compare each row against its own slice without losing row-level alignment.

filter is the cleanup tool for tiny groups:

large_slices = df.groupby("channel").filter(lambda g: len(g) >= 10)

That keeps only groups with enough rows to trust.

What To Inspect¶

whether the highest rate is also supported by a decent count
whether size and count tell the same story
whether missing values are hiding inside a slice
whether the group key is too broad or too narrow
whether a second key changes the story completely
whether value_counts or a cross-tab reveals a pattern the mean hides
whether the table is easier to explain with named aggregation

If a slice looks suspicious, inspect its exact rows before trusting the summary. Group tables are a lens, not the proof itself.

Common Mistakes¶

using count when you really wanted row counts from size
sorting by rate and ignoring the sample size
forgetting dropna=False when missing values are part of the story
using as_index=True and then fighting the index when a plain table would be easier to read
grouping by too many columns and making every slice tiny
reading pivot_table totals without checking the underlying counts

observed=True can also matter when grouping categorical columns because it keeps the output focused on observed categories. That often makes the table easier to read.

Failure Pattern¶

Trusting a dramatic slice rate without checking the count. A slice with review_rate = 1.00 and n = 1 is not strong evidence.

Another failure pattern is collapsing the data into a single pivot and stopping there. If the summary looks interesting, the next step is usually to inspect the rows behind the slice, not to make the table prettier.

Quick Tricks¶

Use as_index=False when you want the result to read like a normal table.
Use dropna=False when missing values should be treated as an explicit slice.
Use size for row counts and count for non-null counts.
Use nunique when you need to know whether a slice is being driven by one repeated value.
Use value_counts(normalize=True) when proportions matter more than raw frequency.
Use transform when you want to compare each row against its group.

Practice¶

Compute a grouped summary by one categorical column and include both size and count.
Compute a two-column slice summary and sort by rate and count.
Mark all slices with fewer than three rows as low confidence.
Rewrite the summary with named aggregation and compare the readability.
Build a pivot_table version and explain whether it makes the pattern easier or harder to read.
Build a crosstab version and explain whether the row/column layout changes your conclusion.
Use transform to attach a slice mean back onto the original table.
Pick one slice that looks strong and explain why its count is enough, or not enough, to trust it.
Find one group where size and count differ and explain what missingness is doing.
Use filter to remove tiny groups, then explain what changed.

Runnable Example¶

Open the matching example in AI Academy and run it from the platform.

While reading the result, ask:

which slice has the highest rate
which slice has the best combination of rate and count
which slice should be treated as a warning rather than a conclusion
whether a second grouping key makes the story sharper or noisier
whether transform would help you compare rows against their slice

Inspect which slices have both a high rate and enough count to be worth discussing.

Questions To Ask¶

Is the highest-rate slice also one of the largest slices?
Does the result change when you add a second grouping key?
Are missing values being counted or silently dropped?
Would pivot_table or crosstab make the report easier to read?
Which group statistic would you want to attach back to each row with transform?
Which slice should be filtered out because it is too small to trust?
What would change if you normalized the counts into proportions?

Longer Connection¶

Continue with Python, NumPy, Pandas, Visualization for a longer inspection workflow.