Hugging Face Daily Papers · · 3 min read

DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, they introduce DeNovoSWE, a large-scale dataset for whole-repository generation.</p>\n","updatedAt":"2026-06-11T02:09:39.611Z","author":{"_id":"63f06116f1a47aaea5bd497b","avatarUrl":"/avatars/7d99ffa59c4579599e852a0ffb261268.svg","fullname":"Guoxin Chen","name":"GuoxinChen","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9208795428276062},"editors":["GuoxinChen"],"editorAvatarUrls":["/avatars/7d99ffa59c4579599e852a0ffb261268.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.10728","authors":[{"_id":"6a28bcc1e7d78ea7587e5268","name":"Jiale Zhao","hidden":false},{"_id":"6a28bcc1e7d78ea7587e5269","name":"Guoxin Chen","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526a","name":"Fanzhe Meng","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526b","name":"Wayne Xin Zhao","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526c","name":"Ruihua Song","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526d","name":"Ji-Rong Wen","hidden":false},{"_id":"6a28bcc1e7d78ea7587e526e","name":"Kai Jia","hidden":false}],"publishedAt":"2026-06-09T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch","submittedOnDailyBy":{"_id":"63f06116f1a47aaea5bd497b","avatarUrl":"/avatars/7d99ffa59c4579599e852a0ffb261268.svg","isPro":false,"fullname":"Guoxin Chen","user":"GuoxinChen","type":"user","name":"GuoxinChen"},"summary":"As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, we introduce DeNovoSWE, a large-scale dataset for whole-repository generation. DeNovoSWE comprises 4,818 high-quality instances, where each instance requires generating a complete repository from documentation. Our dataset is automatically constructed through a carefully designed sandboxed agentic workflow, enabling scalable curation without human annotation. DeNovoSWE is constructed with \"divide and conquer\" and critic-repair philosophy. To balance data quality and diversity, we further introduce a difficulty-aware trajectory filtering strategy. Fine-tuning Qwen3-30B-A3B on DeNovoSWE substantially improves long-horizon SWE performance, raising its score on the challenging BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.","upvotes":27,"discussionId":"6a28bcc1e7d78ea7587e526f","githubRepo":"https://github.com/AweAI-Team/DeNovoSWE","githubRepoAddedBy":"user","ai_summary":"A large-scale dataset called DeNovoSWE is introduced for training code agents to generate entire software repositories from documentation, significantly improving performance on long-horizon software engineering tasks.","ai_keywords":["LLM-based code agents","whole-repository generation","large-scale dataset","sandboxed agentic workflow","divide and conquer","critic-repair philosophy","difficulty-aware trajectory filtering","fine-tuning","Qwen3-30B-A3B","BeyondSWE-Doc2Repo benchmark"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":27,"organization":{"_id":"698becbc51046c8986e285cd","name":"AweAI-Team","fullname":"AweAI Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f06116f1a47aaea5bd497b/nyrxZ_sO7l_2dhQ_NdU6P.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63f06116f1a47aaea5bd497b","avatarUrl":"/avatars/7d99ffa59c4579599e852a0ffb261268.svg","isPro":false,"fullname":"Guoxin Chen","user":"GuoxinChen","type":"user"},{"_id":"674476e821e39628723f13ad","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/7iUE9VSdvuX979_vERXpI.png","isPro":false,"fullname":"mfzzzzzz","user":"mfzzzzzz","type":"user"},{"_id":"665ebae8bcbb98f60db0b4b1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/665ebae8bcbb98f60db0b4b1/YTKM4qTZXh_2SeU8U7BfB.webp","isPro":false,"fullname":"Jiale Zhao","user":"Heisenburger2000","type":"user"},{"_id":"697b44e90211501623740a0c","avatarUrl":"/avatars/df0d7f40071cb13603ae342b2017469c.svg","isPro":false,"fullname":"Awe-AI","user":"Awe-AI","type":"user"},{"_id":"65326e9acb8a5a17e7216b49","avatarUrl":"/avatars/66e407ce56305d34b4fe42bba4365cc4.svg","isPro":false,"fullname":"Deng","user":"DJCheng","type":"user"},{"_id":"67bfed02c36602ae42c63679","avatarUrl":"/avatars/0cf4cf271d34594e779c0f2e52b24600.svg","isPro":false,"fullname":"jiahui chen","user":"five6667","type":"user"},{"_id":"62cd69fb816d30201adcd761","avatarUrl":"/avatars/8281993c7ab047792b73f970089dd4ff.svg","isPro":false,"fullname":"wukangxi","user":"wukx","type":"user"},{"_id":"6393fdf6f41fcf0cb18dedf8","avatarUrl":"/avatars/d980b9a0a748d9c05e3fcaff35aebe1b.svg","isPro":false,"fullname":"LXM","user":"Liteling","type":"user"},{"_id":"65c052e8df72cf99e5073a52","avatarUrl":"/avatars/e8fb85d1dc48f1c2d49b9b5c397c3995.svg","isPro":false,"fullname":"watermelon","user":"Luluwatermelon","type":"user"},{"_id":"69a7d9d919ebde9c1448fb65","avatarUrl":"/avatars/3af0a6de3346047957e4108c405b587f.svg","isPro":false,"fullname":"Ruoyu Xu","user":"xuruoyu2000","type":"user"},{"_id":"67695053920c236674e85f18","avatarUrl":"/avatars/f19bae09ebfc32dab5034d0e6d4161c9.svg","isPro":false,"fullname":"XiangLi","user":"XiangLi-alibaba","type":"user"},{"_id":"65c747f1bbc318a59eceb452","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65c747f1bbc318a59eceb452/W5ERLsLFmwhbt-blcNslJ.jpeg","isPro":false,"fullname":"Shuang Sun","user":"SNHE","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"698becbc51046c8986e285cd","name":"AweAI-Team","fullname":"AweAI Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f06116f1a47aaea5bd497b/nyrxZ_sO7l_2dhQ_NdU6P.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.10728.md"}">
Papers
arxiv:2606.10728

DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

Published on Jun 9
· Submitted by
Guoxin Chen
on Jun 11
Authors:
,
,
,
,
,
,

Abstract

A large-scale dataset called DeNovoSWE is introduced for training code agents to generate entire software repositories from documentation, significantly improving performance on long-horizon software engineering tasks.

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, we introduce DeNovoSWE, a large-scale dataset for whole-repository generation. DeNovoSWE comprises 4,818 high-quality instances, where each instance requires generating a complete repository from documentation. Our dataset is automatically constructed through a carefully designed sandboxed agentic workflow, enabling scalable curation without human annotation. DeNovoSWE is constructed with "divide and conquer" and critic-repair philosophy. To balance data quality and diversity, we further introduce a difficulty-aware trajectory filtering strategy. Fine-tuning Qwen3-30B-A3B on DeNovoSWE substantially improves long-horizon SWE performance, raising its score on the challenging BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%.

Community

Paper submitter about 18 hours ago

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implementing complete software repositories from high-level specifications. However, training agents for such long-horizon software engineering tasks remains difficult due to the scarcity of large-scale, verifiable whole-repository generation data. In this paper, they introduce DeNovoSWE, a large-scale dataset for whole-repository generation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.10728
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.10728 in a model README.md to link it from this page.

Datasets citing this paper 3

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.10728 in a Space README.md to link it from this page.

Collections including this paper 3

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers