r/LocalLLaMA · · 1 min read

GLM 5.2: 98% of max level intelligence with less than half of tokens usage

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

GLM 5.2: 98% of max level intelligence with less than half of tokens usage

According to this number of reasoning tokens from GLM 5.1 to GLM 5.2 more than doubled from 16.7k to 36.7k and for me as a local user with old junk Xeon setup this makes GLM 5.2 unusable to the extent where I had to shut down model after 12h of waiting it to respond to my math problem question.

But then I saw this graph from z_ai technical report, which basically implies that you can use less than half of the tokens of max effort on high level and still get around 98% of max level intelligence at least in coding tasks. So I encourage both local and API users to try high level, because by default GLM 5.2 is set to max level.

https://preview.redd.it/eha9j6vd9e8h1.png?width=6166&format=png&auto=webp&s=204c3261fada0c3eac8e4ab52fed7b45c1831b7b

submitted by /u/perelmanych
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA