r/LocalLLaMA · · 1 min read

Gemma 4 QAT accuracy inconsistencies

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Gemma 4 QAT accuracy inconsistencies

Table from https://unsloth.ai/docs/models/gemma-4/qat#qat-analysis

I heard that MoE models are usually more susceptible to quantization error, but what happened with the 12B? I thought lower-parameter models usually quantized worse and yet, E2B/E4B are pretty much perfect while the 12B deviates from FP16 the most. Do we have an explanation for that, or did maybe something go wrong during quantization-aware training on Google's side with the 12B in particular?

I'd also be interested in the exact methodology used here and comparisons to non-QAT variants if any of the authors of the post linked above are reading this (maybe non-QAT actually performs better here)!

submitted by /u/ai_fonsi
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA