Hugging Face Daily Papers · · 7 min read

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Automatic multi-agent systems aim to instantiate agent workflows without relying on manually designed or fixed orchestration. However, existing automatic MAS approaches remain only partially adaptive: they either perform training-free test-time search or optimize the meta-level designer while keeping downstream execution agents frozen, which creating a frozen-executor ceiling and leaving the end-to-end training of self-designing and self-executing agentic models unexplored. To address this, we introduce MetaAgent-X, an end-to-end reinforcement learning framework that jointly optimizes automatic MAS design and execution. MetaAgent-X enables script-based MAS generation, execution rollout collection, and credit assignment for both designer and executor trajectories. To support stable and scalable optimization, we propose Executor Designer Hierarchical Rollout and Stagewise Co-evolution to improve training stability and expose the dynamics of designer-executor co-evolution. MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. Comprehensive ablations show that both designer and executor improve throughout training, and that effective automatic MAS learning follows a stagewise co-evolution process. These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.</p>\n","updatedAt":"2026-05-18T18:33:06.334Z","author":{"_id":"648d2e2e514bf0ce32ba729f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648d2e2e514bf0ce32ba729f/VPL1rehLxkvixz5oRD6u_.jpeg","fullname":"Yaolun Zhang","name":"Mercury7353","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9000614285469055},"editors":["Mercury7353"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/648d2e2e514bf0ce32ba729f/VPL1rehLxkvixz5oRD6u_.jpeg"],"reactions":[],"isReport":false}},{"id":"6a0bc0cdf314c87588e8a278","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false},"createdAt":"2026-05-19T01:45:49.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Small Model as Master Orchestrator: Learning Unified Agent-Tool Orchestration with Parallel Subtask Decomposition](https://huggingface.co/papers/2604.17009) (2026)\n* [LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning](https://huggingface.co/papers/2605.14483) (2026)\n* [EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems](https://huggingface.co/papers/2605.08769) (2026)\n* [Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems](https://huggingface.co/papers/2604.21794) (2026)\n* [MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation](https://huggingface.co/papers/2604.14564) (2026)\n* [Reinforced Collaboration in Multi-Agent Flow Networks](https://huggingface.co/papers/2605.12943) (2026)\n* [Unified-MAS: Universally Generating Domain-Specific Nodes for Empowering Automatic Multi-Agent Systems](https://huggingface.co/papers/2603.21475) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2604.17009\">Small Model as Master Orchestrator: Learning Unified Agent-Tool Orchestration with Parallel Subtask Decomposition</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.14483\">LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.08769\">EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.21794\">Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.14564\">MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.12943\">Reinforced Collaboration in Multi-Agent Flow Networks</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2603.21475\">Unified-MAS: Universally Generating Domain-Specific Nodes for Empowering Automatic Multi-Agent Systems</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-19T01:45:49.710Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":357,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7208890914916992},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14212","authors":[{"_id":"6a06783db1a8cbabc9f097f1","name":"Yaolun Zhang","hidden":false},{"_id":"6a06783db1a8cbabc9f097f2","name":"Yujie Zhao","hidden":false},{"_id":"6a06783db1a8cbabc9f097f3","name":"Nan Wang","hidden":false},{"_id":"6a06783db1a8cbabc9f097f4","name":"Yiran Wu","hidden":false},{"_id":"6a06783db1a8cbabc9f097f5","name":"Jiayu Chang","hidden":false},{"_id":"6a06783db1a8cbabc9f097f6","name":"Yizhao Chen","hidden":false},{"_id":"6a06783db1a8cbabc9f097f7","name":"Qingyun Wu","hidden":false},{"_id":"6a06783db1a8cbabc9f097f8","name":"Jishen Zhao","hidden":false},{"_id":"6a06783db1a8cbabc9f097f9","name":"Huazheng Wang","hidden":false}],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-18T00:00:00.000Z","title":"MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning","submittedOnDailyBy":{"_id":"648d2e2e514bf0ce32ba729f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648d2e2e514bf0ce32ba729f/VPL1rehLxkvixz5oRD6u_.jpeg","isPro":false,"fullname":"Yaolun Zhang","user":"Mercury7353","type":"user","name":"Mercury7353"},"summary":"Automatic multi-agent systems aim to instantiate agent workflows without relying on manually designed or fixed orchestration. However, existing automatic MAS approaches remain only partially adaptive: they either perform training-free test-time search or optimize the meta-level designer while keeping downstream execution agents frozen, which creating a frozen-executor ceiling and leaving the end-to-end training of self-designing and self-executing agentic models unexplored. To address this, we introduce MetaAgent-X, an end-to-end reinforcement learning framework that jointly optimizes automatic MAS design and execution. MetaAgent-X enables script-based MAS generation, execution rollout collection, and credit assignment for both designer and executor trajectories. To support stable and scalable optimization, we propose Executor Designer Hierarchical Rollout and Stagewise Co-evolution to improve training stability and expose the dynamics of designer-executor co-evolution. MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. Comprehensive ablations show that both designer and executor improve throughout training, and that effective automatic MAS learning follows a stagewise co-evolution process. These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.","upvotes":9,"discussionId":"6a06783eb1a8cbabc9f097fa","projectPage":"https://mercury7353.github.io/MetaAgent-X-Page/","ai_summary":"MetaAgent-X presents an end-to-end reinforcement learning framework that jointly optimizes automatic multi-agent system design and execution through hierarchical rollout and stagewise co-evolution techniques.","ai_keywords":["automatic multi-agent systems","reinforcement learning","joint optimization","script-based generation","execution rollout collection","credit assignment","hierarchical rollout","stagewise co-evolution","end-to-end training","self-designing agents","self-executing agents"],"organization":{"_id":"6897df91ad3033f4085e432c","name":"OregonStateUniversity","fullname":"Oregon State University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6897df118dbb78d2e8837335/ssdqjm2xjvu285uuDBZbd.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"648d2e2e514bf0ce32ba729f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/648d2e2e514bf0ce32ba729f/VPL1rehLxkvixz5oRD6u_.jpeg","isPro":false,"fullname":"Yaolun Zhang","user":"Mercury7353","type":"user"},{"_id":"667b04670b19955ad0f61c7e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Q0hMYkaX8WzquYCUctUKX.png","isPro":false,"fullname":"Yi Ge (Ellen)","user":"ellenyige","type":"user"},{"_id":"6245285af59b8d262df3321b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6245285af59b8d262df3321b/dvy__dTf-miJ60IbveDg4.jpeg","isPro":false,"fullname":"Yifan Zeng","user":"yokey","type":"user"},{"_id":"65837fd2d6cc28d6bd6563c8","avatarUrl":"/avatars/19e1b19dd123388896d76015f38464d9.svg","isPro":false,"fullname":"Nan","user":"nanw","type":"user"},{"_id":"68a4f750d59068b17bf037f7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68a4f750d59068b17bf037f7/qrE5mRw7uW3FSV5zdWJHE.jpeg","isPro":false,"fullname":"Jishen Zhao","user":"jzhao31","type":"user"},{"_id":"68f18db6dcb9e41cd8df67e1","avatarUrl":"/avatars/d22323d9c655d88c0492568f931e7de3.svg","isPro":false,"fullname":"Yujie Zhao","user":"YujieZhao","type":"user"},{"_id":"6301d6455e305a35cb0846a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6301d6455e305a35cb0846a7/aT2AtzRMSY_T3y02MIUap.jpeg","isPro":true,"fullname":"Lanxiang Hu","user":"Snyhlxde","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"668d4893891640bf32158d25","avatarUrl":"/avatars/4d7b283739d28b81e3d60238b595260d.svg","isPro":false,"fullname":"Hao Lu","user":"LH2002","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6897df91ad3033f4085e432c","name":"OregonStateUniversity","fullname":"Oregon State University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6897df118dbb78d2e8837335/ssdqjm2xjvu285uuDBZbd.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.14212.md"}">
Papers
arxiv:2605.14212

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

Published on May 14
· Submitted by
Yaolun Zhang
on May 18
Authors:
,
,
,
,
,
,
,
,

Abstract

MetaAgent-X presents an end-to-end reinforcement learning framework that jointly optimizes automatic multi-agent system design and execution through hierarchical rollout and stagewise co-evolution techniques.

AI-generated summary

Automatic multi-agent systems aim to instantiate agent workflows without relying on manually designed or fixed orchestration. However, existing automatic MAS approaches remain only partially adaptive: they either perform training-free test-time search or optimize the meta-level designer while keeping downstream execution agents frozen, which creating a frozen-executor ceiling and leaving the end-to-end training of self-designing and self-executing agentic models unexplored. To address this, we introduce MetaAgent-X, an end-to-end reinforcement learning framework that jointly optimizes automatic MAS design and execution. MetaAgent-X enables script-based MAS generation, execution rollout collection, and credit assignment for both designer and executor trajectories. To support stable and scalable optimization, we propose Executor Designer Hierarchical Rollout and Stagewise Co-evolution to improve training stability and expose the dynamics of designer-executor co-evolution. MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. Comprehensive ablations show that both designer and executor improve throughout training, and that effective automatic MAS learning follows a stagewise co-evolution process. These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.

Community

Paper submitter about 7 hours ago

Automatic multi-agent systems aim to instantiate agent workflows without relying on manually designed or fixed orchestration. However, existing automatic MAS approaches remain only partially adaptive: they either perform training-free test-time search or optimize the meta-level designer while keeping downstream execution agents frozen, which creating a frozen-executor ceiling and leaving the end-to-end training of self-designing and self-executing agentic models unexplored. To address this, we introduce MetaAgent-X, an end-to-end reinforcement learning framework that jointly optimizes automatic MAS design and execution. MetaAgent-X enables script-based MAS generation, execution rollout collection, and credit assignment for both designer and executor trajectories. To support stable and scalable optimization, we propose Executor Designer Hierarchical Rollout and Stagewise Co-evolution to improve training stability and expose the dynamics of designer-executor co-evolution. MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. Comprehensive ablations show that both designer and executor improve throughout training, and that effective automatic MAS learning follows a stagewise co-evolution process. These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.14212
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.14212 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.14212 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers