As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, they introduce DeNovoSWE, a large-scale dataset for whole-repository generation.</p>\n","updatedAt":"2026-06-11T02:09:39.611Z","author":{"_id":"63f06116f1a47aaea5bd497b","avatarUrl":"/avatars/7d99ffa59c4579599e852a0ffb261268.svg","fullname":"Guoxin Chen","name":"GuoxinChen","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9208795428276062},"editors":["GuoxinChen"],"editorAvatarUrls":["/avatars/7d99ffa59c4579599e852a0ffb261268.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.10728","authors":[{"_id":"6a28bcc1e7d78ea7587e5268","name":"Jiale Zhao","hidden":false},{"_id":"6a28bcc1e7d78ea7587e5269","name":"Guoxin Chen","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526a","name":"Fanzhe Meng","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526b","name":"Wayne Xin Zhao","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526c","name":"Ruihua Song","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526d","name":"Ji-Rong Wen","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526e","name":"Kai Jia","hidden":false}],"publishedAt":"2026-06-09T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch","submittedOnDailyBy":{"_id":"63f06116f1a47aaea5bd497b","avatarUrl":"/avatars/7d99ffa59c4579599e852a0ffb261268.svg","isPro":false,"fullname":"Guoxin Chen","user":"GuoxinChen","type":"user","name":"GuoxinChen"},"summary":"As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, we introduce DeNovoSWE, a large-scale dataset for whole-repository generation. DeNovoSWE comprises 4,818 high-quality instances, where each instance requires generating a complete repository from documentation. Our dataset is automatically constructed through a carefully designed sandboxed agentic workflow, enabling scalable curation without human annotation. DeNovoSWE is constructed with \"divide and conquer\" and critic-repair philosophy. To balance data quality and diversity, we further introduce a difficulty-aware trajectory filtering strategy. Fine-tuning Qwen3-30B-A3B on DeNovoSWE substantially improves long-horizon SWE performance, raising its score on the challenging BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.","upvotes":27,"discussionId":"6a28bcc1e7d78ea7587e526f","githubRepo":"https://github.com/AweAI-Team/DeNovoSWE","githubRepoAddedBy":"user","ai_summary":"A large-scale dataset called DeNovoSWE is introduced for training code agents to generate entire software repositories from documentation, significantly improving performance on long-horizon software engineering tasks.","ai_keywords":["LLM-based code agents","whole-repository generation","large-scale dataset","sandboxed agentic workflow","divide and conquer","critic-repair philosophy","difficulty-aware trajectory filtering","fine-tuning","Qwen3-30B-A3B","BeyondSWE-Doc2Repo benchmark"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":27,"organization":{"_id":"698becbc51046c8986e285cd","name":"AweAI-Team","fullname":"AweAI Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f06116f1a47aaea5bd497b/nyrxZ_sO7l_2dhQ_NdU6P.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63f06116f1a47aaea5bd497b","avatarUrl":"/avatars/7d99ffa59c4579599e852a0ffb261268.svg","isPro":false,"fullname":"Guoxin Chen","user":"GuoxinChen","type":"user"},{"_id":"674476e821e39628723f13ad","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/7iUE9VSdvuX979_vERXpI.png","isPro":false,"fullname":"mfzzzzzz","user":"mfzzzzzz","type":"user"},{"_id":"665ebae8bcbb98f60db0b4b1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/665ebae8bcbb98f60db0b4b1/YTKM4qTZXh_2SeU8U7BfB.webp","isPro":false,"fullname":"Jiale Zhao","user":"Heisenburger2000","type":"user"},{"_id":"697b44e90211501623740a0c","avatarUrl":"/avatars/df0d7f40071cb13603ae342b2017469c.svg","isPro":false,"fullname":"Awe-AI","user":"Awe-AI","type":"user"},{"_id":"65326e9acb8a5a17e7216b49","avatarUrl":"/avatars/66e407ce56305d34b4fe42bba4365cc4.svg","isPro":false,"fullname":"Deng","user":"DJCheng","type":"user"},{"_id":"67bfed02c36602ae42c63679","avatarUrl":"/avatars/0cf4cf271d34594e779c0f2e52b24600.svg","isPro":false,"fullname":"jiahui chen","user":"five6667","type":"user"},{"_id":"62cd69fb816d30201adcd761","avatarUrl":"/avatars/8281993c7ab047792b73f970089dd4ff.svg","isPro":false,"fullname":"wukangxi","user":"wukx","type":"user"},{"_id":"6393fdf6f41fcf0cb18dedf8","avatarUrl":"/avatars/d980b9a0a748d9c05e3fcaff35aebe1b.svg","isPro":false,"fullname":"LXM","user":"Liteling","type":"user"},{"_id":"65c052e8df72cf99e5073a52","avatarUrl":"/avatars/e8fb85d1dc48f1c2d49b9b5c397c3995.svg","isPro":false,"fullname":"watermelon","user":"Luluwatermelon","type":"user"},{"_id":"69a7d9d919ebde9c1448fb65","avatarUrl":"/avatars/3af0a6de3346047957e4108c405b587f.svg","isPro":false,"fullname":"Ruoyu Xu","user":"xuruoyu2000","type":"user"},{"_id":"67695053920c236674e85f18","avatarUrl":"/avatars/f19bae09ebfc32dab5034d0e6d4161c9.svg","isPro":false,"fullname":"XiangLi","user":"XiangLi-alibaba","type":"user"},{"_id":"65c747f1bbc318a59eceb452","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65c747f1bbc318a59eceb452/W5ERLsLFmwhbt-blcNslJ.jpeg","isPro":false,"fullname":"Shuang Sun","user":"SNHE","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"698becbc51046c8986e285cd","name":"AweAI-Team","fullname":"AweAI Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f06116f1a47aaea5bd497b/nyrxZ_sO7l_2dhQ_NdU6P.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.10728.md"}">
DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch
Abstract
A large-scale dataset called DeNovoSWE is introduced for training code agents to generate entire software repositories from documentation, significantly improving performance on long-horizon software engineering tasks.
As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, we introduce DeNovoSWE, a large-scale dataset for whole-repository generation. DeNovoSWE comprises 4,818 high-quality instances, where each instance requires generating a complete repository from documentation. Our dataset is automatically constructed through a carefully designed sandboxed agentic workflow, enabling scalable curation without human annotation. DeNovoSWE is constructed with "divide and conquer" and critic-repair philosophy. To balance data quality and diversity, we further introduce a difficulty-aware trajectory filtering strategy. Fine-tuning Qwen3-30B-A3B on DeNovoSWE substantially improves long-horizon SWE performance, raising its score on the challenging BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.
Community
As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, they introduce DeNovoSWE, a large-scale dataset for whole-repository generation.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.10728 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.10728 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.