<a href=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/J1G-c_MdoqRAWlqrNGYza.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/J1G-c_MdoqRAWlqrNGYza.png\" alt=\"Orchard_overview\"></a></p>\n<p><a href=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/efR8GKf-wGhAO123c7GN5.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/efR8GKf-wGhAO123c7GN5.png\" alt=\"Orchard_env_overview\"></a></p>\n<p><a href=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/dConIn8LdU28bT0KChUT5.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/dConIn8LdU28bT0KChUT5.png\" alt=\"Orchard_performance_overview\"></a></p>\n","updatedAt":"2026-05-15T02:27:10.004Z","author":{"_id":"63ef330b1e695b35aa484e11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ef330b1e695b35aa484e11/bXwpGy0dl8JXeJwJ--ilr.jpeg","fullname":"Qianhui WU","name":"qianhuiwu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.32049039006233215},"editors":["qianhuiwu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63ef330b1e695b35aa484e11/bXwpGy0dl8JXeJwJ--ilr.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.15040","authors":[{"_id":"6a068176b1a8cbabc9f09888","name":"Baolin Peng","hidden":false},{"_id":"6a068176b1a8cbabc9f09889","name":"Wenlin Yao","hidden":false},{"_id":"6a068176b1a8cbabc9f0988a","name":"Qianhui Wu","hidden":false},{"_id":"6a068176b1a8cbabc9f0988b","name":"Hao Cheng","hidden":false},{"_id":"6a068176b1a8cbabc9f0988c","name":"Xiao Yu","hidden":false},{"_id":"6a068176b1a8cbabc9f0988d","name":"Rui Yang","hidden":false},{"_id":"6a068176b1a8cbabc9f0988e","name":"Tao Ge","hidden":false},{"_id":"6a068176b1a8cbabc9f0988f","name":"Alessandrio Sordoni","hidden":false},{"_id":"6a068176b1a8cbabc9f09890","name":"Xingdi Yuan","hidden":false},{"_id":"6a068176b1a8cbabc9f09891","name":"Yelong Shen","hidden":false},{"_id":"6a068176b1a8cbabc9f09892","name":"Pengcheng He","hidden":false},{"_id":"6a068176b1a8cbabc9f09893","name":"Tong Zhang","hidden":false},{"_id":"6a068176b1a8cbabc9f09894","name":"Zhou Yu","hidden":false},{"_id":"6a068176b1a8cbabc9f09895","name":"Jianfeng Gao","hidden":false}],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"Orchard: An Open-Source Agentic Modeling Framework","submittedOnDailyBy":{"_id":"63ef330b1e695b35aa484e11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ef330b1e695b35aa484e11/bXwpGy0dl8JXeJwJ--ilr.jpeg","isPro":false,"fullname":"Qianhui WU","user":"qianhuiwu","type":"user","name":"qianhuiwu"},"summary":"Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B, introduce credit-assignment SFT to learn from productive segments of unresolved trajectories, and apply Balanced Adaptive Rollout for RL. Starting from Qwen3-30B-A3B-Thinking, Orchard-SWE achieves 64.3% on SWE-bench Verified after SFT and 67.5% after SFT+RL, setting a new state of the art among open-source models of comparable size. Orchard-GUI trains a 4B vision-language computer-use agent using only 0.4K distilled trajectories and 2.2K open-ended tasks. It achieves 74.1%, 67.0%, and 64.0% success rates on WebVoyager, Online-Mind2Web, and DeepShop, respectively, making it the strongest open-source model while remaining competitive with proprietary systems. Orchard-Claw targets personal assistant agents. Trained with only 0.2K synthetic tasks, it achieves 59.6% pass@3 on Claw-Eval and 73.9% when paired with a stronger ZeroClaw harness. Collectively, these results show that a lightweight, open, harness-agnostic environment layer enables reusable agentic data, training recipes, and evaluations across domains.","upvotes":10,"discussionId":"6a068176b1a8cbabc9f09896","githubRepo":"https://github.com/microsoft/Orchard","githubRepoAddedBy":"user","ai_summary":"Orchard is an open-source framework for scalable agentic modeling that enables training diverse autonomous agents through specialized recipes for coding, GUI navigation, and personal assistance tasks.","ai_keywords":["agentic modeling","large language models","planning","reasoning","tool use","multi-turn interaction","environment service","sandbox lifecycle management","agentic modeling recipes","SWE-bench","credit-assignment SFT","Balanced Adaptive Rollout","vision-language models","WebVoyager","Online-Mind2Web","DeepShop","Claw-Eval","ZeroClaw harness"],"githubStars":9,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63ef330b1e695b35aa484e11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ef330b1e695b35aa484e11/bXwpGy0dl8JXeJwJ--ilr.jpeg","isPro":false,"fullname":"Qianhui WU","user":"qianhuiwu","type":"user"},{"_id":"61942296d5c2ba6daa290357","avatarUrl":"/avatars/594021cc183c4922d48b46f43772a062.svg","isPro":false,"fullname":"Baolin Peng","user":"Baolin","type":"user"},{"_id":"64d45451c34a346181b130dd","avatarUrl":"/avatars/9bb8205b889337df5d321539c9b5d69d.svg","isPro":true,"fullname":"Rui Yang","user":"Ray2333","type":"user"},{"_id":"64b785384df206a3ed142dc0","avatarUrl":"/avatars/501a90b2c80d9b3a2e0d1819a4211f84.svg","isPro":false,"fullname":"Da Yu","user":"Jellyfish0538","type":"user"},{"_id":"6495947b3eaaf416d924daeb","avatarUrl":"/avatars/2e79b75e0f5f54f11471affc2bb377b6.svg","isPro":false,"fullname":"Eric Yuan","user":"eryua","type":"user"},{"_id":"62927c2e56fedc76e396b3ca","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678105603200-62927c2e56fedc76e396b3ca.jpeg","isPro":false,"fullname":"HAO BAI","user":"JackBAI","type":"user"},{"_id":"6234fd736dcfc5fe9f5b8601","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1647639915850-noauth.jpeg","isPro":false,"fullname":"Xiao Yu","user":"jasonyux","type":"user"},{"_id":"6700b1f93381f2db06857fb5","avatarUrl":"/avatars/c8b9ec7c00773c5a4055ba50de0c6b2f.svg","isPro":false,"fullname":"Hanyang Chen","user":"Hanyang81","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"630777bfcb09c0a9042bdb7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630777bfcb09c0a9042bdb7d/866tpyAMq5-49bil5RvhB.png","isPro":false,"fullname":"Phillip Hughes","user":"Osophy","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.15040.md"}">
Orchard: An Open-Source Agentic Modeling Framework
Authors: ,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
Orchard is an open-source framework for scalable agentic modeling that enables training diverse autonomous agents through specialized recipes for coding, GUI navigation, and personal assistance tasks.
AI-generated summary
Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B, introduce credit-assignment SFT to learn from productive segments of unresolved trajectories, and apply Balanced Adaptive Rollout for RL. Starting from Qwen3-30B-A3B-Thinking, Orchard-SWE achieves 64.3% on SWE-bench Verified after SFT and 67.5% after SFT+RL, setting a new state of the art among open-source models of comparable size. Orchard-GUI trains a 4B vision-language computer-use agent using only 0.4K distilled trajectories and 2.2K open-ended tasks. It achieves 74.1%, 67.0%, and 64.0% success rates on WebVoyager, Online-Mind2Web, and DeepShop, respectively, making it the strongest open-source model while remaining competitive with proprietary systems. Orchard-Claw targets personal assistant agents. Trained with only 0.2K synthetic tasks, it achieves 59.6% pass@3 on Claw-Eval and 73.9% when paired with a stronger ZeroClaw harness. Collectively, these results show that a lightweight, open, harness-agnostic environment layer enables reusable agentic data, training recipes, and evaluations across domains.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.15040 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.15040 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.15040 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.