Detection and Segmentation¶
What This Is¶
This page is about the first structural decision in spatial vision:
- do you need classification
- object detection
- or segmentation
The answer is not about model prestige. The answer is about what the task must return: a label, a box, or a mask.
When You Use It¶
- finding objects in an image
- counting or localizing instances
- measuring boundaries or area
- deciding whether box-level localization is enough or pixel masks are necessary
Start With The Output Requirement¶
Use the task to choose the smallest correct structure:
| Need | Best first move | What to inspect |
|---|---|---|
| only image-level label | classifier | whether location actually matters |
| object location and count | detection | box quality, confidence threshold, missed small objects |
| precise boundaries or area | segmentation | mask quality, edge errors, cleanup rules |
| both object identity and precise region | instance segmentation | whether the extra mask quality changes the final decision |
If a box is enough, do not pay for masks. If a mask is required, a good class score is not enough.
The Core Rule¶
A spatial model is only useful after three things are explicit:
- what counts as a valid prediction
- which overlap metric matches the task
- which threshold or cleanup rule turns raw scores into final outputs
That is why detection and segmentation are workflow topics, not just model topics.
Minimal Pattern¶
This is the shortest detection inference pattern:
import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
model = fasterrcnn_resnet50_fpn(weights="DEFAULT")
model.eval()
with torch.no_grad():
predictions = model([image_tensor])
That pattern only becomes meaningful after you decide what score threshold, overlap rule, and slice checks matter.
What To Inspect First¶
Before you chase a heavier detector, inspect:
- which predictions survive the confidence threshold
- whether small objects are missed first
- whether overlaps are duplicates or genuine nearby instances
- whether box IoU and mask IoU tell the same story
- whether post-processing changes the final answer more than the model change does
If those checks are missing, the workflow is still too blurry.
Box Versus Mask Story¶
Use the smallest evaluation that matches the output:
- box IoU if the task is localization
- mask IoU if the task depends on precise shape
- threshold sweep if the operational cost depends on precision versus recall
Do not let one average score hide a bad spatial policy.
Good First Comparisons¶
Make one of these comparisons before changing the whole model:
- strict versus lenient confidence threshold
- raw prediction versus cleaned prediction
- box IoU versus mask IoU on the same scene
- classifier baseline versus detector, if localization might not actually matter
Those comparisons often reveal that the next fix is policy or cleanup, not architecture.
Failure Pattern¶
The common failure is assuming that a stronger detector is the first fix.
Often the real problem is simpler:
- the task was really classification
- the threshold was too loose or too strict
- post-processing was weak
- the weak slice was only small or overlapping objects
- mask quality mattered but only box quality was being watched
Common Mistakes¶
- using classification accuracy to judge a localization task
- comparing thresholds and model changes at the same time
- ignoring object-size slices
- using segmentation because it sounds more advanced when boxes would suffice
- trusting raw predictions without checking cleanup or non-maximum suppression
- calling a precision gain progress when coverage quietly collapsed
A Good Spatial Decision Note¶
After one run, the learner should be able to say:
- why the task needs detection or segmentation
- which overlap metric matches the real output
- what threshold or cleanup rule was chosen
- which slice failed first
- whether the next move is better policy, better cleanup, or a heavier model
Runnable Example¶
Use the local demo:
academy/.venv/bin/python academy/examples/detection-segmentation-demo.py
That demo is useful because it makes boxes, masks, scores, and IoU visible in one place before the full workflow gets heavier.
Practice¶
- Decide whether one task really needs detection or only classification.
- Compare box IoU and mask IoU on the same image.
- Sweep a confidence threshold and say what changed.
- Explain one case where cleanup beats a heavier model.
- Name one slice where segmentation would matter more than detection.
Longer Connection¶
Continue with Object Detection Basics for a simpler localization bridge, Vision Augmentation and Shift Robustness for the robustness side of vision workflows, and Detection and Segmentation Workflows for the full threshold-and-slice workflow.