Detection and Segmentation¶

What This Is¶

This page is about the first structural decision in spatial vision:

do you need classification
object detection
or segmentation

The answer is not about model prestige. The answer is about what the task must return: a label, a box, or a mask.

When You Use It¶

finding objects in an image
counting or localizing instances
measuring boundaries or area
deciding whether box-level localization is enough or pixel masks are necessary

Start With The Output Requirement¶

Use the task to choose the smallest correct structure:

Need	Best first move	What to inspect
only image-level label	classifier	whether location actually matters
object location and count	detection	box quality, confidence threshold, missed small objects
precise boundaries or area	segmentation	mask quality, edge errors, cleanup rules
both object identity and precise region	instance segmentation	whether the extra mask quality changes the final decision

If a box is enough, do not pay for masks. If a mask is required, a good class score is not enough.

The Core Rule¶

A spatial model is only useful after three things are explicit:

what counts as a valid prediction
which overlap metric matches the task
which threshold or cleanup rule turns raw scores into final outputs

That is why detection and segmentation are workflow topics, not just model topics.

Minimal Pattern¶

This is the shortest detection inference pattern:

import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn

model = fasterrcnn_resnet50_fpn(weights="DEFAULT")
model.eval()

with torch.no_grad():
    predictions = model([image_tensor])

That pattern only becomes meaningful after you decide what score threshold, overlap rule, and slice checks matter.

What To Inspect First¶

Before you chase a heavier detector, inspect:

which predictions survive the confidence threshold
whether small objects are missed first
whether overlaps are duplicates or genuine nearby instances
whether box IoU and mask IoU tell the same story
whether post-processing changes the final answer more than the model change does

If those checks are missing, the workflow is still too blurry.

Box Versus Mask Story¶

Use the smallest evaluation that matches the output:

box IoU if the task is localization
mask IoU if the task depends on precise shape
threshold sweep if the operational cost depends on precision versus recall

Do not let one average score hide a bad spatial policy.

Good First Comparisons¶

Make one of these comparisons before changing the whole model:

strict versus lenient confidence threshold
raw prediction versus cleaned prediction
box IoU versus mask IoU on the same scene
classifier baseline versus detector, if localization might not actually matter

Those comparisons often reveal that the next fix is policy or cleanup, not architecture.

Failure Pattern¶

The common failure is assuming that a stronger detector is the first fix.

Often the real problem is simpler:

the task was really classification
the threshold was too loose or too strict
post-processing was weak
the weak slice was only small or overlapping objects
mask quality mattered but only box quality was being watched

Common Mistakes¶

using classification accuracy to judge a localization task
comparing thresholds and model changes at the same time
ignoring object-size slices
using segmentation because it sounds more advanced when boxes would suffice
trusting raw predictions without checking cleanup or non-maximum suppression
calling a precision gain progress when coverage quietly collapsed

A Good Spatial Decision Note¶

After one run, the learner should be able to say:

why the task needs detection or segmentation
which overlap metric matches the real output
what threshold or cleanup rule was chosen
which slice failed first
whether the next move is better policy, better cleanup, or a heavier model

Runnable Example¶

Use the local demo:

academy/.venv/bin/python academy/examples/detection-segmentation-demo.py

That demo is useful because it makes boxes, masks, scores, and IoU visible in one place before the full workflow gets heavier.

Practice¶

Decide whether one task really needs detection or only classification.
Compare box IoU and mask IoU on the same image.
Sweep a confidence threshold and say what changed.
Explain one case where cleanup beats a heavier model.
Name one slice where segmentation would matter more than detection.

Longer Connection¶

Continue with Object Detection Basics for a simpler localization bridge, Vision Augmentation and Shift Robustness for the robustness side of vision workflows, and Detection and Segmentation Workflows for the full threshold-and-slice workflow.