Skip to content

Detection and Segmentation

What This Is

This page is about the first structural decision in spatial vision:

  • do you need classification
  • object detection
  • or segmentation

The answer is not about model prestige. The answer is about what the task must return: a label, a box, or a mask.

When You Use It

  • finding objects in an image
  • counting or localizing instances
  • measuring boundaries or area
  • deciding whether box-level localization is enough or pixel masks are necessary

Start With The Output Requirement

Use the task to choose the smallest correct structure:

Need Best first move What to inspect
only image-level label classifier whether location actually matters
object location and count detection box quality, confidence threshold, missed small objects
precise boundaries or area segmentation mask quality, edge errors, cleanup rules
both object identity and precise region instance segmentation whether the extra mask quality changes the final decision

If a box is enough, do not pay for masks. If a mask is required, a good class score is not enough.

The Core Rule

A spatial model is only useful after three things are explicit:

  1. what counts as a valid prediction
  2. which overlap metric matches the task
  3. which threshold or cleanup rule turns raw scores into final outputs

That is why detection and segmentation are workflow topics, not just model topics.

Minimal Pattern

This is the shortest detection inference pattern:

import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn

model = fasterrcnn_resnet50_fpn(weights="DEFAULT")
model.eval()

with torch.no_grad():
    predictions = model([image_tensor])

That pattern only becomes meaningful after you decide what score threshold, overlap rule, and slice checks matter.

What To Inspect First

Before you chase a heavier detector, inspect:

  • which predictions survive the confidence threshold
  • whether small objects are missed first
  • whether overlaps are duplicates or genuine nearby instances
  • whether box IoU and mask IoU tell the same story
  • whether post-processing changes the final answer more than the model change does

If those checks are missing, the workflow is still too blurry.

Box Versus Mask Story

Use the smallest evaluation that matches the output:

  • box IoU if the task is localization
  • mask IoU if the task depends on precise shape
  • threshold sweep if the operational cost depends on precision versus recall

Do not let one average score hide a bad spatial policy.

Good First Comparisons

Make one of these comparisons before changing the whole model:

  • strict versus lenient confidence threshold
  • raw prediction versus cleaned prediction
  • box IoU versus mask IoU on the same scene
  • classifier baseline versus detector, if localization might not actually matter

Those comparisons often reveal that the next fix is policy or cleanup, not architecture.

Failure Pattern

The common failure is assuming that a stronger detector is the first fix.

Often the real problem is simpler:

  • the task was really classification
  • the threshold was too loose or too strict
  • post-processing was weak
  • the weak slice was only small or overlapping objects
  • mask quality mattered but only box quality was being watched

Common Mistakes

  • using classification accuracy to judge a localization task
  • comparing thresholds and model changes at the same time
  • ignoring object-size slices
  • using segmentation because it sounds more advanced when boxes would suffice
  • trusting raw predictions without checking cleanup or non-maximum suppression
  • calling a precision gain progress when coverage quietly collapsed

A Good Spatial Decision Note

After one run, the learner should be able to say:

  • why the task needs detection or segmentation
  • which overlap metric matches the real output
  • what threshold or cleanup rule was chosen
  • which slice failed first
  • whether the next move is better policy, better cleanup, or a heavier model

Runnable Example

Use the local demo:

academy/.venv/bin/python academy/examples/detection-segmentation-demo.py

That demo is useful because it makes boxes, masks, scores, and IoU visible in one place before the full workflow gets heavier.

Practice

  1. Decide whether one task really needs detection or only classification.
  2. Compare box IoU and mask IoU on the same image.
  3. Sweep a confidence threshold and say what changed.
  4. Explain one case where cleanup beats a heavier model.
  5. Name one slice where segmentation would matter more than detection.

Longer Connection

Continue with Object Detection Basics for a simpler localization bridge, Vision Augmentation and Shift Robustness for the robustness side of vision workflows, and Detection and Segmentation Workflows for the full threshold-and-slice workflow.