Multilingual-Multimodal-NLP/LoopCoder-V2 · Hugging Face
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
|
LoopCoder-V2 LoopCoder-v2 is a 7B instruction-tuned code model based on the Parallel Loop Transformer (PLT). The model studies test-time computation scaling through repeated application of shared Transformer blocks while keeping the parameter count fixed. The released checkpoint is the two-loop PLT variant ( Highlights
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation ScalingTL;DR. For Parallel Loop Transformers (PLT), more looping is not better. A 7B coder that loops just once more than usual (two passes total) lifts SWE-bench Verified from 43.0 → 64.4, while three or more loops regress. We explain this with a gain–cost view of looping and provide diagnostics for picking the loop count without brute-force sweeps. OverviewLooped Transformers scale latent computation by repeatedly applying a shared block, but sequential looping increases latency and KV-cache memory with the loop count. Parallel Loop Transformers (PLT) alleviate this with two mechanisms:
Once cost is flattened, loop count becomes a free design knob — and the question becomes: how many loops are actually worth it? We study this through a gain–cost lens: an extra loop may refine representations (gain), but CLP also introduces a roughly constant positional mismatch at each loop boundary (cost), which we quantify with an intrinsic offset cost Ω(r). We instantiate the study with LoopCoder-v2, a family of 7B PLT coders trained from scratch on 18T tokens of mixed text and code (1:1, 100+ programming languages), under matched training, instruction tuning, and evaluation. [link] [comments] |
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.