r/LocalLLaMA · · 1 min read

Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Scaling former VibeThinker-1.5B to 3B — now it reaches frontier math & coding performance

https://preview.redd.it/obgodr9dfn7h1.png?width=1796&format=png&auto=webp&s=b5fd95e2b7e6f8ed7704e3de66778e970d34a1dd

  1. We trained VibeThinker-3B to test how far verifiable reasoning can be pushed in a strict small-model regime.
  2. It gets 94.3 on AIME'26, 80.2 on LiveCodeBench v6, 76.4 on IMO-AnswerBench, and 93.4 on IFEval.
  3. On recent unseen LeetCode weekly/biweekly contests, it passes 123/128 first-attempt Python submissions, or 96.1% overall.
  4. Small models are not just cheaper substitutes. In parameter-dense domains with clear verification signals, SLMs offer a path to frontier-level reasoning that complements traditional Scaling Law. Though it still has limitations in broader practical and general-purpose use cases, we will keep improving these areas in future versions.

We’d love for the community to test it on your own math/coding/OOD tasks and share failures or feedback.

Paper: paper link
Eval setting in the report: vLLM/Sglang, temp=1.0, top_p=0.95, top_k=-1.

submitted by /u/Used-Negotiation-741
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA