A quick Gemma4 31B comparison (Q4_k_M, QAT, heretic)
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
No numbers. Not sure if anybody cares…
I’ve run the UD version of Q4_k_m for a month. I talk to this model nicely, because it’s a functional nervous wreck. And initially I thought that might be an alignment thing, so I also have the heretic version when I need a breather from this hyper vigilant over achiever llm.
Don’t get me wrong. It’s great ! Works well .. most of the time. It’s when the context gets long(in my case 20k!), chain of tools gets long , or it knows that it previous has made a mistake, it just falls apart.
Whereas the heretic version. It doesn’t give a dime if it makes a mistake yet still makes plenty.
Then I tried the QAT for a few hours. This one is a zen master. Handling 32k context with full reasoning is piece of cake. Does everything right. Doesn’t try too hard.
The “nervous “ Gemma is probably a quant thing. Trying to achieve full precision being a Q4 is hard I guess. For longer context and maintaining precision QAT is looking pretty good.
[link] [comments]
More from r/LocalLLaMA
-
AA comparison of the latest local models
Jun 6
-
Github Copilot finally supporting custom endpoints
Jun 6
-
OpenLumara - A different kind of AI agent, written from scratch, not vibecoded. Extremely token-efficient, super small system prompt, made for local models. Everything is modular.
Jun 5
-
Gemma 4 QAT benchmark results (AMD 7900 XTX): faster, less VRAM, no quality loss
Jun 5
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.