r/LocalLLaMA · · 2 min read

How to compare Original vs QAT Gemma 4 31B Q4 quants

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I just came across the following post, where a user found some confusing divergence results between Q4 quants of the original and QAT models with a Q8/unquantized reference of the original model.

https://www.reddit.com/r/LocalLLaMA/comments/1tyxu55/gemma_4_31b_qat_q4_vs_standard_q4_top1_kld/

From there I understood that after the retraining of Gemma 4 31B QAT, this could be considered as a different model to Gemma 4 31B original. Therefore, it is not useful to test the divergence of Gemma 4 31B QAT Q4 quants to a reference of original Gemma 4 31B, as they are not expected to behave the same way.

Then I wondered: how could one check whether a Q4 of the original model or a Q4 of the QAT version perform better?

I think this should first involve running a few model benchmarks (e.g., SuperGPQA, HLE, MMLU) of Gemma 4 31B QAT unquantized, to first assess if/how much the retraining damaged overall model performance.

Afterwards, one should compare the divergence of Gemma 4 31B QAT Q4 quants to the reference unquantized QAT, and the divergence of Gemma 4 31B original Q4 quants to the reference of unquantized original model.

I believe these results combined should provide a fair comparison of how much better the QAT model quantizes to Q4, and if it preserves the quality or the original model. This methodology may even make it possible to compare how well Q6 quants fare in comparison for each case.

Nevertheless, I must say I am not an expert in the field and there may be more straightforward ways to analyze this that I am unaware of. Therefore I wanted to engage some discussion here to see if people can share their opinions of what would be the best way to achieve this.

Looking forward to reading your opinion in the comments!

submitted by /u/Hot_Strawberry1999
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA