Hugging Face Daily Papers · June 11, 2026 · 3 min read

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Autonomous training for large language models (LLMs) is entering a new era. Rather than relying on a static recipe, EvoTrainer enables LLM policies and their training harnesses to evolve jointly over time. This is more than conventional AI development, it is AI evolution in action.</p>\n","updatedAt":"2026-06-11T05:53:32.603Z","author":{"_id":"64a3897a34612d376415545c","avatarUrl":"/avatars/3d69ce7f59783e51fd8a97830333ead6.svg","fullname":"youyou","name":"shiyingcheng","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9581112861633301},"editors":["shiyingcheng"],"editorAvatarUrls":["/avatars/3d69ce7f59783e51fd8a97830333ead6.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.03108","authors":[{"_id":"6a28f7cfe7d78ea7587e55e9","name":"Guhong Chen","hidden":false},{"_id":"6a28f7cfe7d78ea7587e55ea","name":"Yingcheng Shi","hidden":false},{"_id":"6a28f7cfe7d78ea7587e55eb","name":"Yongbin Li","hidden":false},{"_id":"6a28f7cfe7d78ea7587e55ec","name":"Binhua Li","hidden":false},{"_id":"6a28f7cfe7d78ea7587e55ed","name":"Xander Xu","hidden":false},{"_id":"6a28f7cfe7d78ea7587e55ee","name":"Hu Wei","hidden":false},{"_id":"6a28f7cfe7d78ea7587e55ef","name":"Shiwen Ni","hidden":false},{"_id":"6a28f7cfe7d78ea7587e55f0","name":"Min Yang","hidden":false},{"_id":"6a28f7cfe7d78ea7587e55f1","name":"Jieping Ye","hidden":false}],"publishedAt":"2026-06-02T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning","submittedOnDailyBy":{"_id":"64a3897a34612d376415545c","avatarUrl":"/avatars/3d69ce7f59783e51fd8a97830333ead6.svg","isPro":false,"fullname":"youyou","user":"shiyingcheng","type":"user","name":"shiyingcheng"},"summary":"Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes. We introduce EvoTrainer, an autonomous training framework that co-evolves LLM policies and training-side harnesses through empirical feedback: it diagnoses rollout-level evidence, revises diagnostics, backtests interventions, and accumulates reusable skills. Evaluated on mathematical reasoning, competitive-programming code generation, and repository-level software engineering, EvoTrainer matches or exceeds the human-engineered RL references under the same data, codebase, and evaluation protocol, with the largest gain on long-horizon agentic SWE. Trajectory analyses show that retained strategies diverge across domains, evolving diagnostics prevent invalid high-scoring branches from being promoted, and reusable skills shape later search. Autonomous LLM RL should move beyond recipe search toward joint evolution of policies and the training harnesses that interpret them.","upvotes":9,"discussionId":"6a28f7cfe7d78ea7587e55f2","ai_summary":"EvoTrainer autonomously evolves both language model policies and training harnesses through empirical feedback, demonstrating superior performance in complex reasoning and coding tasks compared to traditional handcrafted approaches.","ai_keywords":["autonomous training framework","co-evolution","empirical feedback","rollout-level evidence","diagnostics","backtesting","reusable skills","agentic RL","language model policies","training harnesses"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"67d15cca6e2cf0e062dbfb54","name":"AlibabaTongyiLab","fullname":"TongyiLab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67d1502bfabfe9974d1f77bb/XdUSVf6HqBzE7zFBfSDQP.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64a3897a34612d376415545c","avatarUrl":"/avatars/3d69ce7f59783e51fd8a97830333ead6.svg","isPro":false,"fullname":"youyou","user":"shiyingcheng","type":"user"},{"_id":"66c2bdaa5bdd611f9a615336","avatarUrl":"/avatars/5356e6560f9fd84b43bce1b990275f92.svg","isPro":false,"fullname":"Guhong Chen","user":"youzi517","type":"user"},{"_id":"6864cd78885da181fded5283","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/XH4WEaLsWu4kUVnwrph4W.png","isPro":false,"fullname":"chenghao sun","user":"Ryan2004","type":"user"},{"_id":"63db4ea00cc3bc12bc0dd649","avatarUrl":"/avatars/520aec0de90731420c53253cc438c8d8.svg","isPro":false,"fullname":"MoonTide","user":"MoonTideF","type":"user"},{"_id":"64560618bfdf9c63ce2d658a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64560618bfdf9c63ce2d658a/GVBWU4yNzRsjdyzKT3z3B.jpeg","isPro":false,"fullname":"Mathsion Wong","user":"QiYao-Wang","type":"user"},{"_id":"6953a22727f9d6b3746c6d85","avatarUrl":"/avatars/79dca5dbc0a0d72c370cc42cd58e52ab.svg","isPro":false,"fullname":"AnYang","user":"AnthonyYoung","type":"user"},{"_id":"6953897fa6ebf89c814f4cc5","avatarUrl":"/avatars/5f287f9e303ff1c187713fc89e84330f.svg","isPro":false,"fullname":"MBerger","user":"SHakeShakeShake","type":"user"},{"_id":"6a2add0cc79ca02ee84b0a70","avatarUrl":"/avatars/782d4fc8ed5acb217c4b314e4c674cca.svg","isPro":false,"fullname":"amy","user":"amy1236","type":"user"},{"_id":"6a2ae6c2e36bc84d91b6e7cc","avatarUrl":"/avatars/abf4b4c0020f9332b6827952cc53163e.svg","isPro":false,"fullname":"mmgood","user":"mmgood","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67d15cca6e2cf0e062dbfb54","name":"AlibabaTongyiLab","fullname":"TongyiLab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67d1502bfabfe9974d1f77bb/XdUSVf6HqBzE7zFBfSDQP.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.03108.md"}">

Papers

arxiv:2606.03108

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Published on Jun 2

· Submitted by

youyou on Jun 11

TongyiLab

Upvote

Authors:

Abstract

EvoTrainer autonomously evolves both language model policies and training harnesses through empirical feedback, demonstrating superior performance in complex reasoning and coding tasks compared to traditional handcrafted approaches.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes. We introduce EvoTrainer, an autonomous training framework that co-evolves LLM policies and training-side harnesses through empirical feedback: it diagnoses rollout-level evidence, revises diagnostics, backtests interventions, and accumulates reusable skills. Evaluated on mathematical reasoning, competitive-programming code generation, and repository-level software engineering, EvoTrainer matches or exceeds the human-engineered RL references under the same data, codebase, and evaluation protocol, with the largest gain on long-horizon agentic SWE. Trajectory analyses show that retained strategies diverge across domains, evolving diagnostics prevent invalid high-scoring branches from being promoted, and reusable skills shape later search. Autonomous LLM RL should move beyond recipe search toward joint evolution of policies and the training harnesses that interpret them.

View arXiv page View PDF Add to collection

Community

shiyingcheng

Paper submitter about 14 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.03108

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.03108 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.03108 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.03108 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers