Thinking about grabbing 4x Ascend GX10s
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Some in this sub have tested GLM5.2 on 4x DGX Sparks (or Ascend GX10) with 400-500 tok/s prompt processing and ~15 tok/s output at 128k context. Not blazing fast, but usable imo, especially with quantization.
My thinking: If there's an open-source fable 5 sometime in december or next year, I would rather already have hardware ready to run it at a speed I can live with. 1000W power draw doesn't scare me off.
Anyone running this setup want to talk me out of it (or into it)?
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.