Hugging Face Daily Papers · · 4 min read

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

🤗 Model: <a href=\"https://huggingface.co/Multilingual-Multimodal-NLP/LoopCoder-V2\">https://huggingface.co/Multilingual-Multimodal-NLP/LoopCoder-V2</a><br>💻 Code: <a href=\"https://github.com/CSJianYang/LoopCoder\" rel=\"nofollow\">https://github.com/CSJianYang/LoopCoder</a><br>📄Paper: <a href=\"https://arxiv.org/abs/2606.18023\" rel=\"nofollow\">https://arxiv.org/abs/2606.18023</a></p>\n","updatedAt":"2026-06-17T06:39:12.839Z","author":{"_id":"69c67b2fa994b07915a6e083","avatarUrl":"/avatars/3d0fd966df540d34095d2c84ce449180.svg","fullname":"wei zhang","name":"zwpride","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6228488087654114},"editors":["zwpride"],"editorAvatarUrls":["/avatars/3d0fd966df540d34095d2c84ce449180.svg"],"reactions":[],"isReport":false}},{"id":"6a32af36be102880f9ab5324","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false},"createdAt":"2026-06-17T14:29:10.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Interesting breakdown of this paper on arXivLens: https://arxivlens.com/PaperView/Details/loopcoder-v2-only-loop-once-for-efficient-test-time-computation-scaling-6079-1d2399ff\nCovers the executive summary, detailed methodology, and practical applications.","html":"<p>Interesting breakdown of this paper on arXivLens: <a href=\"https://arxivlens.com/PaperView/Details/loopcoder-v2-only-loop-once-for-efficient-test-time-computation-scaling-6079-1d2399ff\" rel=\"nofollow\">https://arxivlens.com/PaperView/Details/loopcoder-v2-only-loop-once-for-efficient-test-time-computation-scaling-6079-1d2399ff</a><br>Covers the executive summary, detailed methodology, and practical applications.</p>\n","updatedAt":"2026-06-17T14:29:10.578Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7079436182975769},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false},"replies":[{"id":"6a32e35bd6472d7e7a3f98e5","author":{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","fullname":"Urro","name":"urroxyz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"createdAt":"2026-06-17T18:11:39.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@adamm-hf ","html":"<p><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;adamm-hf&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/adamm-hf\">@<span class=\"underline\">adamm-hf</span></a></span> </span></span> </p>\n","updatedAt":"2026-06-17T18:11:39.245Z","author":{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","fullname":"Urro","name":"urroxyz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"ru","probability":0.15853583812713623},"editors":["urroxyz"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png"],"reactions":[],"isReport":false,"parentCommentId":"6a32af36be102880f9ab5324"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2606.18023","authors":[{"_id":"6a3201f3bc818ff14e453d61","name":"Jian Yang","hidden":false},{"_id":"6a3201f3bc818ff14e453d62","name":"Shawn Guo","hidden":false},{"_id":"6a3201f3bc818ff14e453d63","name":"Wei Zhang","hidden":false},{"_id":"6a3201f3bc818ff14e453d64","user":{"_id":"64ab99dcb76bfd863eba64c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ab99dcb76bfd863eba64c1/UBXwDPx17X-gl-SzBPvrc.jpeg","isPro":false,"fullname":"TY.Zheng","user":"aaabiao","type":"user","name":"aaabiao"},"name":"Tianyu Zheng","status":"claimed_verified","statusLastChangedAt":"2026-06-17T11:21:10.407Z","hidden":false},{"_id":"6a3201f3bc818ff14e453d65","name":"Yaxin Du","hidden":false},{"_id":"6a3201f3bc818ff14e453d66","name":"Haau-Sing Li","hidden":false},{"_id":"6a3201f3bc818ff14e453d67","user":{"_id":"66a8e2538407031e388c501f","avatarUrl":"/avatars/d16d51f7b1e111efd6d0985995b614be.svg","isPro":false,"fullname":"wjj","user":"wuyuverse","type":"user","name":"wuyuverse"},"name":"Jiajun Wu","status":"claimed_verified","statusLastChangedAt":"2026-06-17T11:21:12.507Z","hidden":false},{"_id":"6a3201f3bc818ff14e453d68","name":"Yue Song","hidden":false},{"_id":"6a3201f3bc818ff14e453d69","name":"Yan Xing","hidden":false},{"_id":"6a3201f3bc818ff14e453d6a","name":"Qingsong Cai","hidden":false},{"_id":"6a3201f3bc818ff14e453d6b","name":"Zelong Huang","hidden":false},{"_id":"6a3201f3bc818ff14e453d6c","name":"Chuan Hao","hidden":false},{"_id":"6a3201f3bc818ff14e453d6d","name":"Ran Tao","hidden":false},{"_id":"6a3201f3bc818ff14e453d6e","name":"Xianglong Liu","hidden":false},{"_id":"6a3201f3bc818ff14e453d6f","name":"Wayne Xin Zhao","hidden":false},{"_id":"6a3201f3bc818ff14e453d70","name":"Mingjie Tang","hidden":false},{"_id":"6a3201f3bc818ff14e453d71","name":"Weifeng Lv","hidden":false},{"_id":"6a3201f3bc818ff14e453d72","name":"Ming Zhou","hidden":false},{"_id":"6a3201f3bc818ff14e453d73","name":"Bryan Dai","hidden":false}],"publishedAt":"2026-06-16T00:00:00.000Z","submittedOnDailyAt":"2026-06-17T00:00:00.000Z","title":"LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.","upvotes":113,"discussionId":"6a3201f3bc818ff14e453d74","ai_summary":"Parallel loop Transformers achieve better code generation performance with two loops due to refined representations, while additional loops cause diminishing returns and increased positional mismatch costs.","ai_keywords":["Looped Transformers","parallel loop Transformers","cross-loop position offsets","shared-KV gated sliding-window attention","loop-count selection","LoopCoder-v2","instruction tuning","SWE-bench","Multi-SWE"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"69c67b2fa994b07915a6e083","avatarUrl":"/avatars/3d0fd966df540d34095d2c84ce449180.svg","isPro":false,"fullname":"wei zhang","user":"zwpride","type":"user"},{"_id":"66a8e2538407031e388c501f","avatarUrl":"/avatars/d16d51f7b1e111efd6d0985995b614be.svg","isPro":false,"fullname":"wjj","user":"wuyuverse","type":"user"},{"_id":"69df2fc26bd856fb9ebb0190","avatarUrl":"/avatars/3f51018cad4743064d06145c394e0eb3.svg","isPro":false,"fullname":"whw","user":"whw06","type":"user"},{"_id":"6382252f54421460665ec501","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6382252f54421460665ec501/gW9fev3T5QPcNq4f9hqB1.jpeg","isPro":false,"fullname":"Yizhi Li","user":"yizhilll","type":"user"},{"_id":"67400450e64f7fba857477b5","avatarUrl":"/avatars/53e780769acd142d72b0578cd6111984.svg","isPro":false,"fullname":"Junhang Cheng","user":"cjhCoder7","type":"user"},{"_id":"64dc39d27f749b6e34702b81","avatarUrl":"/avatars/3db6db301831b838dd172937ef7653df.svg","isPro":false,"fullname":"Du","user":"Dorothydu","type":"user"},{"_id":"64ab99dcb76bfd863eba64c1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ab99dcb76bfd863eba64c1/UBXwDPx17X-gl-SzBPvrc.jpeg","isPro":false,"fullname":"TY.Zheng","user":"aaabiao","type":"user"},{"_id":"695f604f980c20dbb2104cb6","avatarUrl":"/avatars/c4cd1039ec6aeb4b4c14c4d83fd4eeeb.svg","isPro":false,"fullname":"x","user":"xhxlb-12138","type":"user"},{"_id":"5ee6cd27464d0272c8b24545","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5ee6cd27464d0272c8b24545/aSSDBKB84SH4gFuj64ovT.png","isPro":true,"fullname":"Haau-Sing Li","user":"lhaausing","type":"user"},{"_id":"648e5a70df53671f33e94d52","avatarUrl":"/avatars/ea196b6cb1350accc61925cb0875d437.svg","isPro":true,"fullname":"Hongxin Li","user":"HongxinLi","type":"user"},{"_id":"695b11e6a934de056c706ece","avatarUrl":"/avatars/b0655262ceaf3a74cdb7a4ce031b9405.svg","isPro":false,"fullname":"yan xing","user":"yxing-bj","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":1,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.18023.md","query":{}}">
Papers
arxiv:2606.18023

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Published on Jun 16
· Submitted by
taesiri
on Jun 17
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Parallel loop Transformers achieve better code generation performance with two loops due to refined representations, while additional loops cause diminishing returns and increased positional mismatch costs.

Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.18023
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.18023 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.18023 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.18023 in a Space README.md to link it from this page.

Collections including this paper 2

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers