Hugging Face Daily Papers · May 21, 2026 · 4 min read

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Query clustering organizes queries into groups that reflect shared latent capability demands, enabling capability-aware LLM evaluation. Existing clustering methods, which primarily rely on semantic taxonomies or embeddings, often fail to capture such latent capability requirements due to a misalignment between surface-level semantics and actual model performance. We propose ECC, an algorithm that calibrates prior semantic embeddings using limited posterior model comparisons to bridge the gap between surface-level semantics and latent capability requirements. ECC characterizes each cluster through a capability profile parameterized by a Bradley-Terry model and uses trainable mixture weights to accommodate queries with mixed capability demands, jointly learning a flexible, capability-aware clustering structure that supports query-specific inference of LLM capabilities. Extensive quantitative and qualitative evaluations demonstrate that ECC significantly improves LLM capability ranking quality, outperforming human-labeled and embedding-based baselines by an average of 17.64 and 18.02 percentage points, respectively, and proves effective in downstream tasks such as query routing.</p>\n","updatedAt":"2026-05-21T23:16:12.016Z","author":{"_id":"651f30ce9f372ea08ddc5b1c","avatarUrl":"/avatars/db6eb1e1f50477740a653529c4657039.svg","fullname":"Fangzhou Wu","name":"wark123","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8774011135101318},"editors":["wark123"],"editorAvatarUrls":["/avatars/db6eb1e1f50477740a653529c4657039.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.17110","authors":[{"_id":"6a0cc4df65eb30f20d962a02","user":{"_id":"651f30ce9f372ea08ddc5b1c","avatarUrl":"/avatars/db6eb1e1f50477740a653529c4657039.svg","isPro":false,"fullname":"Fangzhou Wu","user":"wark123","type":"user","name":"wark123"},"name":"Fangzhou Wu","status":"claimed_verified","statusLastChangedAt":"2026-05-21T19:24:18.747Z","hidden":false},{"_id":"6a0cc4df65eb30f20d962a03","name":"Sandeep Silwal","hidden":false},{"_id":"6a0cc4df65eb30f20d962a04","name":"Qiuyi Zhang","hidden":false}],"publishedAt":"2026-05-16T00:00:00.000Z","submittedOnDailyAt":"2026-05-21T00:00:00.000Z","title":"Capturing LLM Capabilities via Evidence-Calibrated Query Clustering","submittedOnDailyBy":{"_id":"651f30ce9f372ea08ddc5b1c","avatarUrl":"/avatars/db6eb1e1f50477740a653529c4657039.svg","isPro":false,"fullname":"Fangzhou Wu","user":"wark123","type":"user","name":"wark123"},"summary":"Query clustering organizes queries into groups that reflect shared latent capability demands, enabling capability-aware LLM evaluation. Existing clustering methods, which primarily rely on semantic taxonomies or embeddings, often fail to capture such latent capability requirements due to a misalignment between surface-level semantics and actual model performance. We propose ECC, an algorithm that calibrates prior semantic embeddings using limited posterior model comparisons to bridge the gap between surface-level semantics and latent capability requirements. ECC characterizes each cluster through a capability profile parameterized by a Bradley-Terry model and uses trainable mixture weights to accommodate queries with mixed capability demands, jointly learning a flexible, capability-aware clustering structure that supports query-specific inference of LLM capabilities. Extensive quantitative and qualitative evaluations demonstrate that ECC significantly improves LLM capability ranking quality, outperforming human-labeled and embedding-based baselines by an average of 17.64 and 18.02 percentage points, respectively, and proves effective in downstream tasks such as query routing.","upvotes":1,"discussionId":"6a0cc4e065eb30f20d962a05","ai_summary":"Query clustering algorithm ECC improves LLM capability evaluation by aligning semantic embeddings with latent capability demands through posterior model comparisons and Bradley-Terry modeling.","ai_keywords":["query clustering","latent capability demands","semantic embeddings","Bradley-Terry model","posterior model comparisons","capability-aware clustering","trainable mixture weights","LLM capability ranking","query routing"],"organization":{"_id":"6279cd50c20d41b28913755e","name":"Uwmadison","fullname":"University of Wisconsin-Madison","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/9Jg5WNG52u9RTFzOgQsII.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"651f30ce9f372ea08ddc5b1c","avatarUrl":"/avatars/db6eb1e1f50477740a653529c4657039.svg","isPro":false,"fullname":"Fangzhou Wu","user":"wark123","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6279cd50c20d41b28913755e","name":"Uwmadison","fullname":"University of Wisconsin-Madison","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/9Jg5WNG52u9RTFzOgQsII.png"}}">

Papers

arxiv:2605.17110

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Published on May 16

· Submitted by

Fangzhou Wu on May 21

University of Wisconsin-Madison

Upvote

Authors:

Fangzhou Wu ,

Abstract

Query clustering algorithm ECC improves LLM capability evaluation by aligning semantic embeddings with latent capability demands through posterior model comparisons and Bradley-Terry modeling.

AI-generated summary

View arXiv page View PDF Add to collection

Community

wark123

Paper author Paper submitter about 3 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.17110 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.17110 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.17110 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers