r/LocalLLaMA · May 15, 2026 · 1 min read

Are the rich RAM /poor GPU people wrong here?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Hello Guys,
I know everyone has his definition of local models, but for me i see 2 "reasonable" type of frontier local models.

a dense one that barely fit in a 32GB ou 24GB of gpu for the most "reasonable" GPU wealthy guys and a MOE in the 100B params, the 100ish B billion params can be run on hybrid offload with a decent speed on a 128GB ram, since 128GB is the max a standard motherboard can support. Again it's cheap but common people can still afford it, it's still cheaper than a car 😄 .

We see a lot of limit dense models, like qwen 27B, but for for the 100 MOE type there was only the Qwen 3.5 122B, they didn't even release the 3.6. the best MOE models range in the 30-35B.
does it mean that for rich ram and poor GPU people we don't have much choice, and the big GPU was the only good road?
Of course you can cram minimaxi like with Q3 or deepseek V3 in Q1. but for tool calling , speed and real usage it's barely usable.
I bought a strix halo before the ram-pocalypse, but i see very few use case for the 128GB exept being able to load multiple models that can be done with llama swap

submitted by /u/crowtain
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA