Cutting LLM Evaluation Costs with SySRs: A Bandit Algorithm that Provably Exploits Model Similarity
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:Cutting LLM Evaluation Costs with SySRs: A Bandit Algorithm that Provably Exploits Model Similarity
Abstract:Large Language Models are typically benchmarked by evaluating every model on every test query. For practitioners seeking the best model to deploy, this is often wasteful: if a model clearly performs worse than others, there is no need to precisely estimate its performance. Best-arm identification algorithms can be naturally applied to drastically reduce costs by adaptively allocating evaluation budget. Further, language models often respond similarly to the same prompt-a property previous work has tried to leverage with mixed success. We propose Synchronized Successive Rejects (SySRs), augmenting the classical Successive Rejects algorithm with paired comparisons. Unlike prior attempts to leverage model similarity in best-model identification, our approach is hyperparameter-free and enjoys performance guarantees that improve with the degree of similarity between evaluated models. Empirically, our method outperforms all baselines in terms of average error rate across 15 standard benchmarks, and in terms of worst-case budget for reliably identifying the best model.
| Comments: | Published at ICML 2026 |
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2606.07726 [cs.LG] |
| (or arXiv:2606.07726v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2606.07726
arXiv-issued DOI via DataCite (pending registration)
|
Submission history
From: Florian E. Dorner [view email][v1] Fri, 5 Jun 2026 17:03:19 UTC (20,858 KB)
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark
Jun 9
-
MedicalRec: Medical recommender system for image classification without retraining
Jun 9
-
SPIN: Decentralized Swarm Control via Tensorized Policy Coordination
Jun 9
-
Boundary Variance Inflation Causes Acquisition Bias in Gaussian Processes
Jun 9
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.