r/LocalLLaMA · · 1 min read

Cheapest way to run GLM 5.x locally that's not a unified memory system?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

This is primarily an exercise to determine the possible options, obscure as they might be, to run at least a 4bit quant (let's say roughly IQ4_XS).

  1. Got a CPU only setup? Please share your experience. Sapphire Rapids ES 56core + DDR5 might be an option

  2. Multi GPU setups with partial or complete offloading? What's your performance like?

  3. It's not limited to GLM 5.x, anything similarly sized is ok too for the scope of this discussion.

Personally, I'm running a 5900X + 128GB DDR4 + 7900XT 20GB. The largest model I can run is Minimax 2.7 from AesSedAI at Q4_K_S - https://huggingface.co/AesSedai/MiniMax-M2.7-GGUF

For smaller stuff, it's still Qwen 3.6 27B at IQ4_XS from Unsloth/Bartowski.

submitted by /u/Monad_Maya
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA