r/LocalLLaMA · · 1 min read

GLM 5.2, what speeds are we getting locally?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Can everyone that is able to run GLM 5.2 locally report what their inference engine, system specs, quantization, context size, and tokens/sec? If you're getting great numbers expect follow-up questions. I'll start:

llamma.cpp, 6x RTX 3090, 128 DDR5, i7-13700K, unsloth UD-IQ2_M, 90K context @ Q8_0 KV: 7.8 tokens/sec generation, prompt processing was roughly 40 tokens/sec

submitted by /u/neverbyte
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA