r/LocalLLaMA · June 9, 2026 · 3 min read

Gemma 4 26B A4B IT QAT Comparison

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Hopefully this isn't too low effort of a post. I just finished the benchmarks and I figured I'd post them online because they certainly were insightful for me. I did not use any AI other than asking Gemini 3.1 Pro if it was statistically significant because I was too tired to do inferential statistics.

Methodology:
oMLX used to run Gemma 4 26BA4B IT from mlx-community. I used the following models:

Gemma 26B 4 Bit: https://huggingface.co/mlx-community/gemma-4-26b-a4b-it-4bit
Gemma 26B 6 Bit: https://huggingface.co/mlx-community/gemma-4-26b-a4b-it-6bit
Gemma 26B QAT 8 Bit: https://huggingface.co/mlx-community/gemma-4-26B-A4B-it-qat-8bit

I ran them on a Macbook M5 Pro 64GB with oMLX on version 0.4.1 and unquantized kv cache, and thinking enabled.

I ran the following tests on all models: 50 MMLU_PRO questions, and 100 HumanEval questions.

The only difference in the chat templates between all of those models above relates to multimodal tool calls, so it did not impact the results. Additionally, they were all quantized using the same method, so the only variable should be the original model weights.

I chose the 8 bit QAT to avoid confounding variables from any mlx specific quantization damage. My goal was to compare the QAT model as close to the original as possible to the original model. This model should be virtually identical to the unsloth q4_k_xl quant of the QAT model. (I mean legitimately very close to identical, not "TQ4 is basically BF16 identical")

I chose to compare it to a mlx 4 bit and 6 bit quant, as both bpw ranges are within the range that users have expressed uncertainty about replacing their old quant with a new QAT model.

Results:

Model	Benchmark	Percentage (Correct/Total)
Gemma 4 26B IT 4 Bit	MMLU_PRO	56.0% (28/50)
Gemma 4 26B IT 4 Bit	HUMANEVAL	90.0% (90/100)
Gemma 4 26B IT 6 Bit	MMLU_PRO	58.0% (29/50)
Gemma 4 26B IT 6 Bit	HUMANEVAL	98.0% (98/100)
Gemma 4 26B IT QAT 8 Bit	MMLU_PRO	52.0% (26/50)
Gemma 4 26B IT QAT 8 Bit	HUMANEVAL	90.0% (90/100)

Interpretation:
Both chi-squared tests and z tests were performed by Gemini.

The only statistically convincing evidence of a difference across all these benchmarks is that the QAT 8 Bit model performs worse than the 6 Bit model on HUMANEVAL. The performance differences seen on MMLU_PRO are not statistically significant and can be attributed to random chance due to the smaller sample size (50 questions).

Thus the conclusion that I have reached is that the QAT model is worse than a Q6 quant of the original model. This means that the claim that "QAT is indistinguishable from BF16" or "the distributions are very close" is likely wrong, as the full QAT model is unlikely to beat the tested 8 bit model, but the full non-QAT model is very likely to beat the q6 model, meaning a wider gap than I was able to produce is likely present.

QAT was not clearly better or worse than a regular MLX q4 quant. Now, for GGUF, QAT likely still smashes Q4_0 out of the park and might even be competitive with IQ4_XS, but it seems that the assumption that q4_k, q5, and even q6 quants should be replaced with QAT quants is a bit early.

I might run more tests on the 26B, or even test out the 31B model later, as the sample sizes that I have are just enough to begin to get an idea.

Creative writing may be different, but I mainly wanted to measure similarity with the original model, and worse benchmark performance is by definition indicative of dissimilarity.

Also this is a MoE, and so maybe the QAT works better on the 31B.

Tldr; Gemma 4 QAT unquantized is inferior to Gemma 4 unquantized and so it might not make sense to replace 5, 6, or even dynamic 4 bit quants with Gemma 4 26B QAT. These observations may not generalize to the 31B, 12B, or E2/4B.

submitted by /u/GoodTip7897
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA