Baseline Comparisondumbmodel.com
Reference catalog

Museum of Collapse

The failure modes we measure against. Every exhibit is a real way embeddings go wrong, with the diagnostic or gate that catches it before a model ships.

Anisotropy (the cone)

Geometry
Signature
effective rank ≪ d; mean pairwise cosine → 1; vectors crowd a narrow cone.
Cause
Contrastive training on too-similar pairs pushes outputs toward a dominant direction; the embedding space loses angular spread.
Detection
Effective-rank + mean-pairwise-similarity diagnostics (the free /check health check).
Hardening
eval-harness rankAboveBaseline gate

Dimension starvation

Geometry
Signature
singular-value spectrum falls off a cliff after a few dims; utilization = erank / dmax is low.
Cause
Only a handful of dimensions carry signal; the rest are near-zero noise, wasting storage and compute.
Detection
Space-utilization metric on the health check (erank vs max possible rank).

Mode collapse

Retrieval
Signature
distinct inputs map to near-identical vectors; top-k overlap is near-total across queries.
Cause
Loss minimized by mapping everything to a single region; the model ‘wins’ retrieval by returning the same chunks regardless of query.
Detection
nDCG@10 non-regression gate + collapse score on the panel.
Hardening
eval-harness ndcgNonRegression gate

MRL truncation cliff

Retrieval
Signature
mrlKnnTruncated ≪ mrlKnnFull; retrieval drops past tolerance when serving a prefix dim.
Cause
Matryoshka prefix dims weren’t trained jointly, so truncating for low-latency tier serve silently destroys recall.
Detection
mrlWithinTolerance gate fails closed when truncated KNN drops > 0.05 or below the 0.30 floor.
Hardening
eval-harness MRL gate (REV-905)

Thin-corpus false pass

Retrieval
Signature
< 5 real eval pairs; gate still reads green by falling back to hard-coded demo pairs.
Cause
A deploy gate measured on toy data passes models that have never been tested on real pairs. ‘deploy gate’ means nothing for thin corpora.
Detection
sufficientEvalPairs gate fails (not falls back) below a minimum real-pair count.
Hardening
eval-harness sufficientEvalPairs (REV-905)

Per-request model drift

Operations
Signature
same input text yields different embeddings across calls; indexing and serve diverge.
Cause
Checkpoint + tokenizer reloaded on every request (or per chunk): nondeterministic init, cache misses, and a trivial DoS surface.
Detection
Checkpoint LRU cache + deterministic serve path; health check reproducibility.
Hardening
checkpoint cache (REV-903)

Run the free health checkSee the reference panel