r/LocalLLaMA · · 1 min read

GLM-5.2 benchmarked on DeepSWE: Beats Gemini & GPT-5.4, but the token volume/cost makes it wildly inefficient? (Theo - t3.gg)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Saw this breakdown from Theo (t3.gg) on X showing the latest DeepSWE leaderboard stats for the new GLM-5.2 open-weight model.The good news: it's officially surpassing GPT-5.4 and the entire Gemini lineup in raw coding capability. Seeing an open-weight model punch that high is incredibly dope.The catch? It is not cheap to run.According to the chart:GPT-5.5 (medium) and Claude Opus 4.8 (high) are both cheaper and smarter on an average cost-per-task basis.GLM-5.2 is sitting far lower on the efficiency curve despite its open-weight status.Theo points out a massive caveat in the replies: GLM-5.2 apparently uses way more output tokens. So even if the baseline token cost looks cheap on paper, the sheer volume of tokens required to complete a task drives the total cost way up.

submitted by /u/klippers
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA