Unsupervised Learning and Representation¶

This pack is about grouping, projection, geometry, and feature reuse. The questions are adapted from public official course materials and rewritten into academy form.

Use this pack when the main uncertainty is geometric judgment: grouping, projection, or reuse decisions without labels doing all the work.

Use This Pack Cold¶

Do not scroll for passive review.

Spend 45 to 90 seconds on each prompt before opening the answer.
State the object of judgment first: cluster step, PCA direction, visualization limit, or reuse choice.
Give one reason tied to geometry or downstream use.
If two prompts in the same theme feel weak, stop and route back before finishing the pack.

Cold answer shape for this pack:

clustering prompts: state the operation or decision rule, then the geometric reason
PCA or visualization prompts: state what the method preserves, then what it does not prove
representation prompts: choose the safer reuse path, then tie it to downstream evaluation

QU01. One K-Means Move¶

Question: Why does one k-means iteration always have two phases rather than one?

Cold target: Name both phases in order.

Reveal Academy Answer

Because the algorithm alternates between two different operations.
First it assigns each point to its nearest center.
Then it recomputes each center as the mean of the assigned points.
Assignment and averaging are different subproblems, so k-means alternates between them.

Why it matters: If you can say the update clearly, the code becomes easier to debug.

Source family: Stanford CS229 unsupervised-learning materials

QU02. Should You Standardize First?¶

Question: You want to cluster points using features measured in dollars, seconds, and counts. Is standardization a cosmetic step or a first real modeling decision?

Cold target: Pick cosmetic or real decision, then justify it with distance geometry.

Reveal Academy Answer

It is a real modeling decision.
Distance-based methods are scale-sensitive.
Without standardization, a large-scale feature can dominate the geometry even if it is not the most meaningful feature.

Why it matters: Clustering quality often depends as much on representation as on the clustering algorithm.

Source family: Cornell CS4786 clustering themes and Stanford CS229 PCA notes

QU03. First Principal Component¶

Question: If a cloud of points is elongated mostly along one diagonal direction, what should the first principal component capture?

Cold target: State the preserved quantity, not the class story.

Reveal Academy Answer

The direction of greatest variance.
PCA does not care about labels or class boundaries.
It finds the axis that explains the most spread in the data.

Why it matters: PCA is a geometry tool, not a classifier.

Source family: UC Berkeley CS189 PCA materials and Stanford CS229 PCA notes

QU04. Can t-SNE Prove Cluster Count?¶

Question: A t-SNE plot shows three visually separated blobs. Does that prove the dataset contains exactly three real clusters?

Cold target: Say no first, then name what t-SNE is allowed to do.

Reveal Academy Answer

No.
t-SNE is useful for visualization, but it intentionally distorts global geometry.
It can create visually persuasive separation that should not be used by itself as proof of cluster count or cluster quality.

Why it matters: Low-dimensional views are inspection tools, not final evidence.

Source family: UC Berkeley CS189 PCA materials and Stanford CS229 unsupervised-learning notes

QU05. EM Responsibilities And Variance¶

Question: In a Gaussian mixture, what happens to responsibilities when component variances shrink and means stay fixed?

Cold target: Name the direction of change and why it happens.

Reveal Academy Answer

Responsibilities become sharper.
Points become more strongly associated with the nearest compatible component.
In the limit, soft assignments can approach hard assignments for well-separated regions.

Why it matters: This is the intuition behind why mixture models can behave more decisively when components tighten.

Source family: Stanford CS229 mixture-of-Gaussians and EM materials

QU06. Better Clusters, Worse Downstream Task¶

Question: A new embedding produces cleaner unsupervised cluster plots, but a downstream classifier trained on those embeddings performs worse. Which result should you trust for deployment?

Cold target: Name the deployment criterion and reject the proxy.

Reveal Academy Answer

Trust the downstream evaluation tied to the real task.
Cleaner visualization is only indirect evidence.
If deployment success depends on classification or retrieval, the embedding should be judged on that objective first.

Why it matters: Representation quality is task-dependent.

Source family: Stanford CS229 unsupervised-learning materials and UC Berkeley CS189 evaluation themes

QU07. Frozen Embeddings Or Train From Scratch?¶

Question: You have a tiny labeled dataset and a strong pretrained embedding model from a related domain. Is the safer first baseline to freeze the embedding or train the whole representation stack from scratch?

Cold target: Pick the safer baseline and name the variance story.

Reveal Academy Answer

Freeze the pretrained embedding first.
With little labeled data, training the whole representation from scratch usually has much higher variance.
A frozen embedder plus a simple head gives a strong baseline and a clean point of comparison.

Why it matters: Representation reuse is often the highest-leverage move in small-data regimes.

Source family: Stanford CS229 unsupervised-learning materials and Stanford CS231n assignments

QU08. Which Cluster Count Wins?¶

Question: Cluster count (k=5) gives slightly better inertia than (k=4), but (k=4) produces cleaner, more stable segments across random seeds and is easier to explain operationally. Which one should win automatically?

Cold target: Reject the automatic winner and name the real decision rule.

Reveal Academy Answer

Neither wins automatically.
Inertia alone is not the goal.
If the operational use cares about stability and interpretability, (k=4) may be the better choice even with slightly worse fit.
The decision should be based on the downstream use of the clusters.

Why it matters: Unsupervised learning is full of proxy metrics. The proxy is not the mission.

Source family: Cornell CS4786 clustering themes and Stanford CS229 unsupervised-learning materials

What To Do After This Pack¶

Route by the kind of miss:

0-1 misses: move on to Advanced Unsupervised and Manifold Workflows and keep the same geometry-vs-proxy discipline.
misses on QU01, QU02, QU03, or QU08: repair the clustering basics in Clustering and Low-Dimensional Views.
misses on QU06 or QU07: route into Representation Reuse and Embedding Transfer before taking another unsupervised pack.
repeated confusion about t-SNE or PCA: do not trust plots yet; rerun the low-dimensional topic and come back cold tomorrow.