r/LocalLLaMA · July 1, 2026 · 1 min read

Thinking about grabbing 4x Ascend GX10s

#long-context #open-source #gpu #inference #security

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Some in this sub have tested GLM5.2 on 4x DGX Sparks (or Ascend GX10) with 400-500 tok/s prompt processing and ~15 tok/s output at 128k context. Not blazing fast, but usable imo, especially with quantization.

My thinking: If there's an open-source fable 5 sometime in december or next year, I would rather already have hardware ready to run it at a speed I can live with. 1000W power draw doesn't scare me off.

Anyone running this setup want to talk me out of it (or into it)?

submitted by /u/chikengunya
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA