Track 16

Text Workflows Beyond Classification

This track pushes text work past plain intent labels: retrieval, reranking, constrained template selection, tokenization tradeoffs, and calibration-aware decisions under a fixed split.

Open Text Topic Run Fast Examples Use The Study Plan

Primary Goal

Rank Before You Pretend It Is Classification

The point is not to force every text problem into a label. The point is to decide when retrieval, reranking, and constrained selection are the real workflow.

Best For

Candidate Choice Under Shift

Use this track when the task is really about choosing among allowed candidates, surviving phrasing change, and keeping the text decision auditable.

Exit Rule

One Stable Candidate-Selection Workflow

You are done when you can defend the retrieval baseline, the reranking gain, the tokenizer choice, and the response-library rule in one short note.

Use This Track When¶

the task is really candidate choice, not plain class prediction
you need retrieval and reranking on a fixed candidate set
tokenization choice changes what the model can see under phrasing shift
the final answer must stay inside a constrained response library or policy boundary

What This Track Is Training¶

This track trains one text decision ladder:

retrieve a candidate set first
rerank only if the first pass leaves useful ambiguity
keep the output library fixed while the scorer improves
compare tokenizers on the same split and the same shift
check calibration before turning the score into a thresholded decision

That means the learner should be able to keep these explicit:

whether the task is ranking, reranking, or constrained selection
whether the candidate set stayed fixed while the scorer changed
whether tokenization is the main source of robustness or fragility
whether the score is only useful for ordering or also trustworthy enough for thresholding

First Session¶

Use this order:

Text Representations and Order
Calibration and Thresholds
run academy/.venv/bin/python academy/examples/text-beyond-classification-recipes/retrieval_rerank_demo.py
run academy/.venv/bin/python academy/examples/text-beyond-classification-recipes/template_selection_demo.py
write one short note on whether retrieval alone is enough or whether reranking or constraints changed the decision

Full Track Loop¶

For the complete workflow:

review the text-representation and calibration topics before changing the scorer
run the retrieval/reranking example and the template-selection example on the same task idea
install the lab requirements with academy/.venv/bin/python -m pip install -r academy/labs/text-workflows-beyond-classification/requirements.txt
run the full lab with academy/.venv/bin/python academy/labs/text-workflows-beyond-classification/src/text_workflows_beyond_classification.py
finish the matching exercises in academy/exercises/text-workflows-beyond-classification/
keep one short note with the retrieval baseline, reranking gain, tokenizer choice, and response-library rule

What To Inspect¶

By the end of the track, the learner should have inspected:

whether retrieval gets the right candidate into the top set before reranking
whether reranking changes the actual top choice instead of only smoothing the score table
whether word, character, and hashing tokenizers behave differently under phrasing shift
whether the constrained response library still covers the task without hiding a policy problem
whether calibration changes the threshold story even when the ranking already looks usable

Common Failure Modes¶

using a classifier where a ranking stage is the real task
changing the candidate set and the reranker at the same time
comparing tokenizers on different splits or different shift definitions
treating the constrained output library as safe without checking whether it still covers the task
using thresholded decisions before low-confidence cases have been inspected

Exit Standard¶

Before leaving this track, the learner should be able to:

explain the difference between retrieval, reranking, and constrained selection
defend a tokenizer choice under a specific phrasing shift
show whether calibration changed the decision story or only the confidence story
keep the response library fixed while improving the ranking rule
say what evidence would make another text iteration worth the time

That is enough to move into Problem Adaptation and Post-Processing or another advanced decision workflow.