Skip to content

Track 16

Text Workflows Beyond Classification

This track pushes text work past plain intent labels: retrieval, reranking, constrained template selection, tokenization tradeoffs, and calibration-aware decisions under a fixed split.

Primary Goal

Rank Before You Pretend It Is Classification

The point is not to force every text problem into a label. The point is to decide when retrieval, reranking, and constrained selection are the real workflow.

Best For

Candidate Choice Under Shift

Use this track when the task is really about choosing among allowed candidates, surviving phrasing change, and keeping the text decision auditable.

Exit Rule

One Stable Candidate-Selection Workflow

You are done when you can defend the retrieval baseline, the reranking gain, the tokenizer choice, and the response-library rule in one short note.

Use This Track When

  • the task is really candidate choice, not plain class prediction
  • you need retrieval and reranking on a fixed candidate set
  • tokenization choice changes what the model can see under phrasing shift
  • the final answer must stay inside a constrained response library or policy boundary

What This Track Is Training

This track trains one text decision ladder:

  • retrieve a candidate set first
  • rerank only if the first pass leaves useful ambiguity
  • keep the output library fixed while the scorer improves
  • compare tokenizers on the same split and the same shift
  • check calibration before turning the score into a thresholded decision

That means the learner should be able to keep these explicit:

  • whether the task is ranking, reranking, or constrained selection
  • whether the candidate set stayed fixed while the scorer changed
  • whether tokenization is the main source of robustness or fragility
  • whether the score is only useful for ordering or also trustworthy enough for thresholding

First Session

Use this order:

  1. Text Representations and Order
  2. Calibration and Thresholds
  3. run academy/.venv/bin/python academy/examples/text-beyond-classification-recipes/retrieval_rerank_demo.py
  4. run academy/.venv/bin/python academy/examples/text-beyond-classification-recipes/template_selection_demo.py
  5. write one short note on whether retrieval alone is enough or whether reranking or constraints changed the decision

Full Track Loop

For the complete workflow:

  1. review the text-representation and calibration topics before changing the scorer
  2. run the retrieval/reranking example and the template-selection example on the same task idea
  3. install the lab requirements with academy/.venv/bin/python -m pip install -r academy/labs/text-workflows-beyond-classification/requirements.txt
  4. run the full lab with academy/.venv/bin/python academy/labs/text-workflows-beyond-classification/src/text_workflows_beyond_classification.py
  5. finish the matching exercises in academy/exercises/text-workflows-beyond-classification/
  6. keep one short note with the retrieval baseline, reranking gain, tokenizer choice, and response-library rule

What To Inspect

By the end of the track, the learner should have inspected:

  • whether retrieval gets the right candidate into the top set before reranking
  • whether reranking changes the actual top choice instead of only smoothing the score table
  • whether word, character, and hashing tokenizers behave differently under phrasing shift
  • whether the constrained response library still covers the task without hiding a policy problem
  • whether calibration changes the threshold story even when the ranking already looks usable

Common Failure Modes

  • using a classifier where a ranking stage is the real task
  • changing the candidate set and the reranker at the same time
  • comparing tokenizers on different splits or different shift definitions
  • treating the constrained output library as safe without checking whether it still covers the task
  • using thresholded decisions before low-confidence cases have been inspected

Exit Standard

Before leaving this track, the learner should be able to:

  • explain the difference between retrieval, reranking, and constrained selection
  • defend a tokenizer choice under a specific phrasing shift
  • show whether calibration changed the decision story or only the confidence story
  • keep the response library fixed while improving the ranking rule
  • say what evidence would make another text iteration worth the time

That is enough to move into Problem Adaptation and Post-Processing or another advanced decision workflow.