Model comparison
Same query and corpus — evaluate retrieval quality across baseline and org-trained embedders.
Plain InfoNCE
Folded before lunch. · demo corpus
Dumbness Score96 · Maximum cone
erank 3.8nDCG@10 0.39
Top retrieval (demo corpus)
#1 · Spectral collapse in contrastive learning56%
When encoder outputs concentrate on a low-dimensional cone, effective rank drops toward 1.0 and retrieval diversity coll…
#2 · InfoNCE without surgery48%
Plain InfoNCE fine-tuning is the internal control. It often folds before domain adaptation finishes; that's the dumb bas…
#3 · Multi-hop RAG failure modes37%
Naive bi-encoder retrieval misses bridge documents. Query decomposition and HyDE-style augmentation recover nDCG on mult…
Blue Hen org model
Live pgvector · run search
Click "Search live corpus" to compare against your deployed org model.
Production org models via core-api. Certified benchmarks on Validation Lab.