r/LocalLLaMA · · 1 min read

A quick Gemma4 31B comparison (Q4_k_M, QAT, heretic)

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

No numbers. Not sure if anybody cares…

I’ve run the UD version of Q4_k_m for a month. I talk to this model nicely, because it’s a functional nervous wreck. And initially I thought that might be an alignment thing, so I also have the heretic version when I need a breather from this hyper vigilant over achiever llm.

Don’t get me wrong. It’s great ! Works well .. most of the time. It’s when the context gets long(in my case 20k!), chain of tools gets long , or it knows that it previous has made a mistake, it just falls apart.

Whereas the heretic version. It doesn’t give a dime if it makes a mistake yet still makes plenty.

Then I tried the QAT for a few hours. This one is a zen master. Handling 32k context with full reasoning is piece of cake. Does everything right. Doesn’t try too hard.

The “nervous “ Gemma is probably a quant thing. Trying to achieve full precision being a Q4 is hard I guess. For longer context and maintaining precision QAT is looking pretty good.

submitted by /u/Some-Cauliflower4902
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA