Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including τ^2-Bench and VitaBench. By fully automating both environment construction and trajectory synthesis, EnvFactory provides a scalable, extensible, and robust foundation for Agentic RL.</p>\n","updatedAt":"2026-05-20T04:17:59.713Z","author":{"_id":"67e95c8d2b124840d0cb8d7f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UCt1iaTveaIXA-NBEqX3A.png","fullname":"shawnxzhu","name":"shawnxzhu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8937362432479858},"editors":["shawnxzhu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UCt1iaTveaIXA-NBEqX3A.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.18703","authors":[{"_id":"6a0c266072c785ae8a652bc2","name":"Minrui Xu","hidden":false},{"_id":"6a0c266072c785ae8a652bc3","name":"Zilin Wang","hidden":false},{"_id":"6a0c266072c785ae8a652bc4","name":"Mengyi DENG","hidden":false},{"_id":"6a0c266072c785ae8a652bc5","name":"Zhiwei Li","hidden":false},{"_id":"6a0c266072c785ae8a652bc6","name":"Zhicheng Yang","hidden":false},{"_id":"6a0c266072c785ae8a652bc7","name":"Xiao Zhu","hidden":false},{"_id":"6a0c266072c785ae8a652bc8","name":"Yinhong Liu","hidden":false},{"_id":"6a0c266072c785ae8a652bc9","name":"Boyu Zhu","hidden":false},{"_id":"6a0c266072c785ae8a652bca","name":"Baiyu Huang","hidden":false},{"_id":"6a0c266072c785ae8a652bcb","name":"Chao Chen","hidden":false},{"_id":"6a0c266072c785ae8a652bcc","name":"Heyuan Deng","hidden":false},{"_id":"6a0c266072c785ae8a652bcd","name":"Fei Mi","hidden":false},{"_id":"6a0c266072c785ae8a652bce","name":"Lifeng Shang","hidden":false},{"_id":"6a0c266072c785ae8a652bcf","name":"Xingshan Zeng","hidden":false},{"_id":"6a0c266072c785ae8a652bd0","name":"Zhijiang Guo","hidden":false}],"publishedAt":"2026-05-18T00:00:00.000Z","submittedOnDailyAt":"2026-05-20T00:00:00.000Z","title":"EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL","submittedOnDailyBy":{"_id":"67e95c8d2b124840d0cb8d7f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UCt1iaTveaIXA-NBEqX3A.png","isPro":false,"fullname":"shawnxzhu","user":"shawnxzhu","type":"user","name":"shawnxzhu"},"summary":"Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including τ^2-Bench and VitaBench. By fully automating both environment construction and trajectory synthesis, EnvFactory provides a scalable, extensible, and robust foundation for Agentic RL.","upvotes":36,"discussionId":"6a0c266172c785ae8a652bd1","githubRepo":"https://github.com/LARK-AI-Lab/EnvFactory","githubRepoAddedBy":"user","ai_summary":"EnvFactory automates the creation of executable tool environments and natural multi-turn trajectories for training LLMs with agentic reinforcement learning, achieving superior performance with fewer resources.","ai_keywords":["Agentic Reinforcement Learning","tool-use capabilities","execution environments","synthetic trajectories","topology-aware sampling","calibrated refinement","grounded queries","SFT trajectories","RL trajectories","Qwen3-series models","BFCLv3","MCP-Atlas","τ²-Bench","VitaBench"],"githubStars":16,"organization":{"_id":"6980a3aede8ee5f0a7de0007","name":"LARK-Lab","fullname":"LARK Lab@HKUST (GZ)","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b6af3accebeadccc868efd/H6b3XExLG87O3ZFPV7Pr5.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67f8ccce9301e8cd1592b71f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/pmQUnMY4GqTYmm0K-_7BA.png","isPro":false,"fullname":"WangZilin","user":"terr1ble","type":"user"},{"_id":"6980a631fea1db72ec8272db","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6980a631fea1db72ec8272db/DAkagzsz7DMRW3PhtqrmH.jpeg","isPro":false,"fullname":"Minrui Xu","user":"RolandXMR","type":"user"},{"_id":"64fed23f0871bc5930598ab5","avatarUrl":"/avatars/080a4ef3e4634cd978528dfa899a4eb0.svg","isPro":false,"fullname":"ZhiWei LI","user":"Aragonaa","type":"user"},{"_id":"63b6af3accebeadccc868efd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b6af3accebeadccc868efd/cFTHKggMpsoaPe_46gcy9.webp","isPro":false,"fullname":"Zhijiang","user":"Zeee","type":"user"},{"_id":"649ab25550a99f8c104b560f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/649ab25550a99f8c104b560f/3BY1oUBrs3HOjKFfIeScv.jpeg","isPro":false,"fullname":"Xingshan Zeng","user":"zxshamson","type":"user"},{"_id":"63d2cfb1b734eaa4d4f5e92a","avatarUrl":"/avatars/d2ff40929fddbd35aef658b5eb2c6bbf.svg","isPro":false,"fullname":"Yinhong Liu","user":"yinhongliu","type":"user"},{"_id":"6469e4ac4c1cd18b497537bb","avatarUrl":"/avatars/5149203a9015956578deaf3710c30cef.svg","isPro":false,"fullname":"Zhou","user":"xinyu04","type":"user"},{"_id":"61669c456916c52acd5a1aa3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61669c456916c52acd5a1aa3/HnZTwRaXgTeTG3ljO3ITb.jpeg","isPro":false,"fullname":"jianbo dai","user":"jbd","type":"user"},{"_id":"643a587fe2b979ae6141b193","avatarUrl":"/avatars/1726b6a1629d800795f9bdf6d03ad190.svg","isPro":false,"fullname":"yilong xu","user":"sapphirex","type":"user"},{"_id":"67e95c8d2b124840d0cb8d7f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/UCt1iaTveaIXA-NBEqX3A.png","isPro":false,"fullname":"shawnxzhu","user":"shawnxzhu","type":"user"},{"_id":"66273cd097b597050a8e7122","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/Act6TH_qVX68zbo17BcLh.jpeg","isPro":false,"fullname":"Zhicheng YANG","user":"yangzhch6","type":"user"},{"_id":"668a5cab75b0d7666e6130bd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/4G1wI2S4chzbFXInPeAQR.png","isPro":false,"fullname":"Baiyu Huang","user":"Adelante","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6980a3aede8ee5f0a7de0007","name":"LARK-Lab","fullname":"LARK Lab@HKUST (GZ)","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63b6af3accebeadccc868efd/H6b3XExLG87O3ZFPV7Pr5.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.18703.md"}">
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
Authors: ,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
EnvFactory automates the creation of executable tool environments and natural multi-turn trajectories for training LLMs with agentic reinforcement learning, achieving superior performance with fewer resources.
AI-generated summary
Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including τ^2-Bench and VitaBench. By fully automating both environment construction and trajectory synthesis, EnvFactory provides a scalable, extensible, and robust foundation for Agentic RL.
Community
Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlenecked by two challenges: the lack of scalable, robust execution environments and the scarcity of realistic training data that captures implicit human reasoning. Existing approaches depend on costly real-world APIs, hallucination-prone LLM simulators, or synthetic environments that are often single-turn or depend on pre-collected documents. Moreover, synthetic trajectories are frequently over-specified, resembling instruction sequences rather than natural human intents, reducing their effectiveness for RL training. We introduce EnvFactory, a fully automated framework that addresses both challenges. EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents. Using only 85 verified environments across 7 domains, EnvFactory generates 2,575 SFT and RL trajectories. Despite using significantly fewer environments than prior work, which are often 5 times more, EnvFactory achieves superior training efficiency and downstream performance, improving Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks including τ^2-Bench and VitaBench. By fully automating both environment construction and trajectory synthesis, EnvFactory provides a scalable, extensible, and robust foundation for Agentic RL.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.18703 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.