Qwen3.6-27B Quantization Benchmark
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Hi everyone! This is my attempt to benchmark and compare the quality of some of the well known Qwen3.6 27B quantizations on HuggingFace (unsloth, mradermacher, IQ4_XS from cHunter789 and Ununnilium), from Q8 all the way down to Q2. Measurement methodI'm using llama.cpp's All runs were using the same context length of 8192 tokens, KV cache quantized to q8_0 so I can make sure the entire model fit in the GPU. Understand KLD and Same Top PTo understand the test result, it would be useful to understand the difference between the two metrics I used. When an LLM predicts the next word of a given prompt, for example "Today I will do my", it looks at its entire vocabulary and assigns a confidence score to every single token. Then samples the top tokens and pick the final one, based on the given temperature.
So, while you might get a good token choice with the quantized model (Same Top P is high), it's important to look at the Mean KLD to see how stable the inner probability of the model is, the lower, the better. Benchmark resultUnsloth's quantizationNothing special, higher quants are better than lower quants. Q6 to Q8 are pretty much lossless. You can see Q8_0 has a higher Same Top P, but underlying, the Mean KLD tells us that UD-Q8_K_XL is better. Anything below Q4 are for the desperate, like the 5060ti 16GB club. The 4-bit cluster is a bit more interesting. Different people may have a different take on this, but to me, Q4_K_XL is a good quality-compromise if you can afford the VRAM. If you're tight, IQ4_XS could serve you well, IQ4_NL is not much difference. And in that case, there's no need to stretch for Q4_K_M. You can skip Q4_K_S. From Q3_K_XL, the quality degradation is more drastic. The KLD went all above 0.1 and matching token selection dropped to 90-85% can tell a lot about the instability. mradermacher's and other quantsI've seen people mention mradermacher's i1 quants here and there, and also IQ4_XS quants from cHunter789 and Ununnilium. I have been personally using Ununnilium's IQ4_XS for a while now. So I want to put them all on the same table to see how they fit. But a single diagram will not be enough so I will break them into 4 groups: Q8-Q6, Q5, Q4 and Q3-below. 8-bit and 6-bit quantizationmradermacher's Q6_K seems to be a clear winner over Unsloth's Q6_K here. The mean KLD is near perfect (0.027352), and 97.011% token selection match. 5-bit quantizationIn this group, Unsloth is a winner. With about 300-500MB difference in size, you can skip Q5_K_S and go for Q5_K_M. Unsloth's Q5_K_M is clearly better in both matching token selection and KLD. 4-bit quantizationUnsloth beats all of the 4-bit quants here. But if you are looking for some alternative quants to save VRAM, like ones on 16GB, pay attention to IQ4_XS (it will help but of course, you will not be able to get above 65k context window). mradermacher's IQ4_XS is a clear winner among all the other IQ4_XS quants, but at 15.1 GB, it would be a bit tight. cHunter's IQ4_XS is also very good at 14.7 GB. 3-bit and belowAgain, mradermacher's quants filled in the gap between Unsloth's quants here, so you get a bit more choice, but tbh, at this range, you better off with Unsloth's Q3_K_XL or at least Q3_K_M. I was very interested to see how some new quants like IQ3_S, IQ3_M perform, but they turned out a bit disappointed. Raw benchmark dataIf you are interested, here's the raw benchmark data table after all the run.
There are many more Qwen3.6 27B quantizations on HuggingFace, like ones from bartowski, huihui,... within my time budget (not money budget, since I'm basically using modal.com's free monthly credit :P), I cannot benchmark them all. If you are interested in doing your own benchmark, I also attached the script in my original blog post, so you can run it on your own. See it here: https://www.huy.rocks/everyday/05-29-2026-ai-qwen3-6-27b-quantization-benchmark Would love to see the result if any of you decided to run on your own. Thanks for reading this far! [link] [comments] |
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.