Track 16
Text Workflows Beyond Classification
This track pushes text work past plain intent labels: retrieval, reranking, constrained template selection, tokenization tradeoffs, and calibration-aware decisions under a fixed split.
Primary Goal
Rank Before You Pretend It Is Classification
The point is not to force every text problem into a label. The point is to decide when retrieval, reranking, and constrained selection are the real workflow.
Best For
Candidate Choice Under Shift
Use this track when the task is really about choosing among allowed candidates, surviving phrasing change, and keeping the text decision auditable.
Exit Rule
One Stable Candidate-Selection Workflow
You are done when you can defend the retrieval baseline, the reranking gain, the tokenizer choice, and the response-library rule in one short note.
Use This Track When¶
- the task is really candidate choice, not plain class prediction
- you need retrieval and reranking on a fixed candidate set
- tokenization choice changes what the model can see under phrasing shift
- the final answer must stay inside a constrained response library or policy boundary
What This Track Is Training¶
This track trains one text decision ladder:
- retrieve a candidate set first
- rerank only if the first pass leaves useful ambiguity
- keep the output library fixed while the scorer improves
- compare tokenizers on the same split and the same shift
- check calibration before turning the score into a thresholded decision
That means the learner should be able to keep these explicit:
- whether the task is ranking, reranking, or constrained selection
- whether the candidate set stayed fixed while the scorer changed
- whether tokenization is the main source of robustness or fragility
- whether the score is only useful for ordering or also trustworthy enough for thresholding
First Session¶
Use this order:
- Text Representations and Order
- Calibration and Thresholds
- run
academy/.venv/bin/python academy/examples/text-beyond-classification-recipes/retrieval_rerank_demo.py - run
academy/.venv/bin/python academy/examples/text-beyond-classification-recipes/template_selection_demo.py - write one short note on whether retrieval alone is enough or whether reranking or constraints changed the decision
Full Track Loop¶
For the complete workflow:
- review the text-representation and calibration topics before changing the scorer
- run the retrieval/reranking example and the template-selection example on the same task idea
- install the lab requirements with
academy/.venv/bin/python -m pip install -r academy/labs/text-workflows-beyond-classification/requirements.txt - run the full lab with
academy/.venv/bin/python academy/labs/text-workflows-beyond-classification/src/text_workflows_beyond_classification.py - finish the matching exercises in
academy/exercises/text-workflows-beyond-classification/ - keep one short note with the retrieval baseline, reranking gain, tokenizer choice, and response-library rule
What To Inspect¶
By the end of the track, the learner should have inspected:
- whether retrieval gets the right candidate into the top set before reranking
- whether reranking changes the actual top choice instead of only smoothing the score table
- whether word, character, and hashing tokenizers behave differently under phrasing shift
- whether the constrained response library still covers the task without hiding a policy problem
- whether calibration changes the threshold story even when the ranking already looks usable
Common Failure Modes¶
- using a classifier where a ranking stage is the real task
- changing the candidate set and the reranker at the same time
- comparing tokenizers on different splits or different shift definitions
- treating the constrained output library as safe without checking whether it still covers the task
- using thresholded decisions before low-confidence cases have been inspected
Exit Standard¶
Before leaving this track, the learner should be able to:
- explain the difference between retrieval, reranking, and constrained selection
- defend a tokenizer choice under a specific phrasing shift
- show whether calibration changed the decision story or only the confidence story
- keep the response library fixed while improving the ranking rule
- say what evidence would make another text iteration worth the time
That is enough to move into Problem Adaptation and Post-Processing or another advanced decision workflow.