Gemma 4 QAT accuracy inconsistencies
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Table from https://unsloth.ai/docs/models/gemma-4/qat#qat-analysis I heard that MoE models are usually more susceptible to quantization error, but what happened with the 12B? I thought lower-parameter models usually quantized worse and yet, E2B/E4B are pretty much perfect while the 12B deviates from FP16 the most. Do we have an explanation for that, or did maybe something go wrong during quantization-aware training on Google's side with the 12B in particular? I'd also be interested in the exact methodology used here and comparisons to non-QAT variants if any of the authors of the post linked above are reading this (maybe non-QAT actually performs better here)! [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.