<a href=\"https://cdn-uploads.huggingface.co/production/uploads/60ed74f536ceac2554083559/oW_baii7bgJvKtRSkiIJc.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/60ed74f536ceac2554083559/oW_baii7bgJvKtRSkiIJc.png\" alt=\"intro_HD\"></a></p>\n","updatedAt":"2026-05-13T01:56:42.295Z","author":{"_id":"60ed74f536ceac2554083559","avatarUrl":"/avatars/28c184ef76f719a720d933d05afb5800.svg","fullname":"taicheng guo","name":"taicheng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":13,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.41122305393218994},"editors":["taicheng"],"editorAvatarUrls":["/avatars/28c184ef76f719a720d933d05afb5800.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.11518","authors":[{"_id":"6a03d98a86b054ce2fa40d12","name":"Taicheng Guo","hidden":false},{"_id":"6a03d98a86b054ce2fa40d13","name":"Nitesh V. Chawla","hidden":false},{"_id":"6a03d98a86b054ce2fa40d14","name":"Olaf Wiest","hidden":false},{"_id":"6a03d98a86b054ce2fa40d15","name":"Xiangliang Zhang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/60ed74f536ceac2554083559/ZKqKKFH_6xXrXjlF_JFUr.mp4"],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive","submittedOnDailyBy":{"_id":"60ed74f536ceac2554083559","avatarUrl":"/avatars/28c184ef76f719a720d933d05afb5800.svg","isPro":false,"fullname":"taicheng guo","user":"taicheng","type":"user","name":"taicheng"},"summary":"Effectively configuring scalable large language model (LLM) experiments, spanning architecture design, hyperparameter tuning, and beyond, is crucial for advancing LLM research, as poor configuration choices can waste substantial computational resources and prevent models from realizing their full potential. Prior automated methods are designed for low-cost settings where repeated trial and error is feasible, but scalable LLM experiments are too expensive for such extensive iteration. To our knowledge, no work has addressed the automation of high-cost LLM experiment configurations, leaving this problem labor-intensive and dependent on expert intuition. Motivated by this gap, we propose AutoLLMResearch, an agentic framework that mimics how human researchers learn generalizable principles from low-fidelity experiments and extrapolate to efficiently identify promising configurations in expensive LLM settings. The core challenge is how to enable an agent to learn, through interaction with a multi-fidelity experimental environment that captures the structure of the LLM configuration landscape. To achieve this, we propose a systematic framework with two key components: 1) LLMConfig-Gym, a multi-fidelity environment encompassing four critical LLM experiment tasks, supported by over one million GPU hours of verifiable experiment outcomes; 2) A structured training pipeline that formulates configuration research as a long-horizon Markov Decision Process and accordingly incentivizes cross-fidelity extrapolation reasoning. Extensive evaluation against diverse strong baselines on held-out experiments demonstrates the effectiveness, generalization, and interpretability of our framework, supporting its potential as a practical and general solution for scalable real-world LLM experiment automation.","upvotes":2,"discussionId":"6a03d98a86b054ce2fa40d16","projectPage":"https://arxiv.org/pdf/2605.11518","githubRepo":"https://github.com/taichengguo/AutoLLMResearch","githubRepoAddedBy":"user","ai_summary":"An agentic framework called AutoLLMResearch automates high-cost large language model experiment configurations by learning from multi-fidelity experimental environments and enabling efficient configuration identification through cross-fidelity extrapolation.","ai_keywords":["large language models","automated experiment configuration","multi-fidelity environment","Markov Decision Process","cross-fidelity extrapolation","LLMConfig-Gym","agentic framework"],"githubStars":1,"organization":{"_id":"6356ef35fe4ffe942db2460b","name":"notredame","fullname":"University of Notre Dame","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/RJJ94XCJw7R0WkOyrvXIU.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"60ed74f536ceac2554083559","avatarUrl":"/avatars/28c184ef76f719a720d933d05afb5800.svg","isPro":false,"fullname":"taicheng guo","user":"taicheng","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6356ef35fe4ffe942db2460b","name":"notredame","fullname":"University of Notre Dame","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/RJJ94XCJw7R0WkOyrvXIU.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.11518.md"}">
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive
Abstract
An agentic framework called AutoLLMResearch automates high-cost large language model experiment configurations by learning from multi-fidelity experimental environments and enabling efficient configuration identification through cross-fidelity extrapolation.
AI-generated summary
Effectively configuring scalable large language model (LLM) experiments, spanning architecture design, hyperparameter tuning, and beyond, is crucial for advancing LLM research, as poor configuration choices can waste substantial computational resources and prevent models from realizing their full potential. Prior automated methods are designed for low-cost settings where repeated trial and error is feasible, but scalable LLM experiments are too expensive for such extensive iteration. To our knowledge, no work has addressed the automation of high-cost LLM experiment configurations, leaving this problem labor-intensive and dependent on expert intuition. Motivated by this gap, we propose AutoLLMResearch, an agentic framework that mimics how human researchers learn generalizable principles from low-fidelity experiments and extrapolate to efficiently identify promising configurations in expensive LLM settings. The core challenge is how to enable an agent to learn, through interaction with a multi-fidelity experimental environment that captures the structure of the LLM configuration landscape. To achieve this, we propose a systematic framework with two key components: 1) LLMConfig-Gym, a multi-fidelity environment encompassing four critical LLM experiment tasks, supported by over one million GPU hours of verifiable experiment outcomes; 2) A structured training pipeline that formulates configuration research as a long-horizon Markov Decision Process and accordingly incentivizes cross-fidelity extrapolation reasoning. Extensive evaluation against diverse strong baselines on held-out experiments demonstrates the effectiveness, generalization, and interpretability of our framework, supporting its potential as a practical and general solution for scalable real-world LLM experiment automation.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.11518 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.11518 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.11518 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.