Hugging Face Daily Papers · May 15, 2026 · 4 min read

Orchard: An Open-Source Agentic Modeling Framework

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

<a href=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/J1G-c_MdoqRAWlqrNGYza.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/J1G-c_MdoqRAWlqrNGYza.png\" alt=\"Orchard_overview\"></a>\n<a href=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/efR8GKf-wGhAO123c7GN5.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/efR8GKf-wGhAO123c7GN5.png\" alt=\"Orchard_env_overview\"></a>\n<a href=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/dConIn8LdU28bT0KChUT5.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/63ef330b1e695b35aa484e11/dConIn8LdU28bT0KChUT5.png\" alt=\"Orchard_performance_overview\"></a>\n","updatedAt":"2026-05-15T02:27:10.004Z","author":{"_id":"63ef330b1e695b35aa484e11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ef330b1e695b35aa484e11/bXwpGy0dl8JXeJwJ--ilr.jpeg","fullname":"Qianhui WU","name":"qianhuiwu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":11,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.32049039006233215},"editors":["qianhuiwu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63ef330b1e695b35aa484e11/bXwpGy0dl8JXeJwJ--ilr.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.15040","authors":[{"_id":"6a068176b1a8cbabc9f09888","name":"Baolin Peng","hidden":false},{"_id":"6a068176b1a8cbabc9f09889","name":"Wenlin Yao","hidden":false},{"_id":"6a068176b1a8cbabc9f0988a","name":"Qianhui Wu","hidden":false},{"_id":"6a068176b1a8cbabc9f0988b","name":"Hao Cheng","hidden":false},{"_id":"6a068176b1a8cbabc9f0988c","name":"Xiao Yu","hidden":false},{"_id":"6a068176b1a8cbabc9f0988d","name":"Rui Yang","hidden":false},{"_id":"6a068176b1a8cbabc9f0988e","name":"Tao Ge","hidden":false},{"_id":"6a068176b1a8cbabc9f0988f","name":"Alessandrio Sordoni","hidden":false},{"_id":"6a068176b1a8cbabc9f09890","name":"Xingdi Yuan","hidden":false},{"_id":"6a068176b1a8cbabc9f09891","name":"Yelong Shen","hidden":false},{"_id":"6a068176b1a8cbabc9f09892","name":"Pengcheng He","hidden":false},{"_id":"6a068176b1a8cbabc9f09893","name":"Tong Zhang","hidden":false},{"_id":"6a068176b1a8cbabc9f09894","name":"Zhou Yu","hidden":false},{"_id":"6a068176b1a8cbabc9f09895","name":"Jianfeng Gao","hidden":false}],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"Orchard: An Open-Source Agentic Modeling Framework","submittedOnDailyBy":{"_id":"63ef330b1e695b35aa484e11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ef330b1e695b35aa484e11/bXwpGy0dl8JXeJwJ--ilr.jpeg","isPro":false,"fullname":"Qianhui WU","user":"qianhuiwu","type":"user","name":"qianhuiwu"},"summary":"Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B, introduce credit-assignment SFT to learn from productive segments of unresolved trajectories, and apply Balanced Adaptive Rollout for RL. Starting from Qwen3-30B-A3B-Thinking, Orchard-SWE achieves 64.3% on SWE-bench Verified after SFT and 67.5% after SFT+RL, setting a new state of the art among open-source models of comparable size. Orchard-GUI trains a 4B vision-language computer-use agent using only 0.4K distilled trajectories and 2.2K open-ended tasks. It achieves 74.1%, 67.0%, and 64.0% success rates on WebVoyager, Online-Mind2Web, and DeepShop, respectively, making it the strongest open-source model while remaining competitive with proprietary systems. Orchard-Claw targets personal assistant agents. Trained with only 0.2K synthetic tasks, it achieves 59.6% pass@3 on Claw-Eval and 73.9% when paired with a stronger ZeroClaw harness. Collectively, these results show that a lightweight, open, harness-agnostic environment layer enables reusable agentic data, training recipes, and evaluations across domains.","upvotes":10,"discussionId":"6a068176b1a8cbabc9f09896","githubRepo":"https://github.com/microsoft/Orchard","githubRepoAddedBy":"user","ai_summary":"Orchard is an open-source framework for scalable agentic modeling that enables training diverse autonomous agents through specialized recipes for coding, GUI navigation, and personal assistance tasks.","ai_keywords":["agentic modeling","large language models","planning","reasoning","tool use","multi-turn interaction","environment service","sandbox lifecycle management","agentic modeling recipes","SWE-bench","credit-assignment SFT","Balanced Adaptive Rollout","vision-language models","WebVoyager","Online-Mind2Web","DeepShop","Claw-Eval","ZeroClaw harness"],"githubStars":9,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63ef330b1e695b35aa484e11","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ef330b1e695b35aa484e11/bXwpGy0dl8JXeJwJ--ilr.jpeg","isPro":false,"fullname":"Qianhui WU","user":"qianhuiwu","type":"user"},{"_id":"61942296d5c2ba6daa290357","avatarUrl":"/avatars/594021cc183c4922d48b46f43772a062.svg","isPro":false,"fullname":"Baolin Peng","user":"Baolin","type":"user"},{"_id":"64d45451c34a346181b130dd","avatarUrl":"/avatars/9bb8205b889337df5d321539c9b5d69d.svg","isPro":true,"fullname":"Rui Yang","user":"Ray2333","type":"user"},{"_id":"64b785384df206a3ed142dc0","avatarUrl":"/avatars/501a90b2c80d9b3a2e0d1819a4211f84.svg","isPro":false,"fullname":"Da Yu","user":"Jellyfish0538","type":"user"},{"_id":"6495947b3eaaf416d924daeb","avatarUrl":"/avatars/2e79b75e0f5f54f11471affc2bb377b6.svg","isPro":false,"fullname":"Eric Yuan","user":"eryua","type":"user"},{"_id":"62927c2e56fedc76e396b3ca","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678105603200-62927c2e56fedc76e396b3ca.jpeg","isPro":false,"fullname":"HAO BAI","user":"JackBAI","type":"user"},{"_id":"6234fd736dcfc5fe9f5b8601","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1647639915850-noauth.jpeg","isPro":false,"fullname":"Xiao Yu","user":"jasonyux","type":"user"},{"_id":"6700b1f93381f2db06857fb5","avatarUrl":"/avatars/c8b9ec7c00773c5a4055ba50de0c6b2f.svg","isPro":false,"fullname":"Hanyang Chen","user":"Hanyang81","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"630777bfcb09c0a9042bdb7d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/630777bfcb09c0a9042bdb7d/866tpyAMq5-49bil5RvhB.png","isPro":false,"fullname":"Phillip Hughes","user":"Osophy","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68151d0f51add3813f3f7d1b","name":"MicrosoftResearch","fullname":"Microsoft Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6529a4f2f1205983224fa513/PeuVr7jSuJflmDBBGxoDX.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.15040.md"}">

Papers

arxiv:2605.15040

Orchard: An Open-Source Agentic Modeling Framework

Published on May 14

· Submitted by

Qianhui WU on May 15

Microsoft Research

Upvote

Authors:

Abstract

Orchard is an open-source framework for scalable agentic modeling that enables training diverse autonomous agents through specialized recipes for coding, GUI navigation, and personal assistance tasks.

AI-generated summary

Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-source framework for scalable agentic modeling. At its core is Orchard Env, a lightweight environment service providing reusable primitives for sandbox lifecycle management across task domains, agent harnesses, and pipeline stages. On top of Orchard Env, we build three agentic modeling recipes. Orchard-SWE targets coding agents. We distill 107K trajectories from MiniMax-M2.5 and Qwen3.5-397B, introduce credit-assignment SFT to learn from productive segments of unresolved trajectories, and apply Balanced Adaptive Rollout for RL. Starting from Qwen3-30B-A3B-Thinking, Orchard-SWE achieves 64.3% on SWE-bench Verified after SFT and 67.5% after SFT+RL, setting a new state of the art among open-source models of comparable size. Orchard-GUI trains a 4B vision-language computer-use agent using only 0.4K distilled trajectories and 2.2K open-ended tasks. It achieves 74.1%, 67.0%, and 64.0% success rates on WebVoyager, Online-Mind2Web, and DeepShop, respectively, making it the strongest open-source model while remaining competitive with proprietary systems. Orchard-Claw targets personal assistant agents. Trained with only 0.2K synthetic tasks, it achieves 59.6% pass@3 on Claw-Eval and 73.9% when paired with a stronger ZeroClaw harness. Collectively, these results show that a lightweight, open, harness-agnostic environment layer enables reusable agentic data, training recipes, and evaluations across domains.