New Qwen3.6 27b Autoround Quant (int4) Best Recipe
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I've been using the int4 Autoround quant from "Lorbus/Qwen3.6-27B-int4-AutoRound" and it has been pretty good! Great quality and performance on an RTX 5090 vllm.
I decided to use a similar Autoround recipe but use the "autorund-best" preset instead, it uses more iterations to increase the quality. I have created a default version and a code calibrated quant both at int4. Recipe and calibration dataset can be found within the model card.
webhie/Qwen3.6-27B-int4-AutoRound (Best Recipe) webhie/Qwen3.6-27B-int4-AutoRound · Hugging Face
webhie/Qwen3.6-27B-int4-AutoRound-Code (Best Recipe) webhie/Qwen3.6-27B-int4-AutoRound-Code · Hugging Face
Token Generation: 60-80tps (w/o mtp) & 130-160tps (w mtp 3)
Note: This model is extremely sensitive to chat template changes, if you encounter issues (looping, incomplete responses, etc.) with any other Qwen 3.6 model try v11 from here: froggeric/Qwen-Fixed-Chat-Templates · Hugging Face
V11 is included with the HF quant.
[link] [comments]
More from r/LocalLLaMA
-
24+ tok/s from ~30B MoE models on an old GTX 1080 (8 GB VRAM, 128k context)
May 13
-
Web-Search is coming to a screeching performance halt as Google shuts down their free search index, and traffic defenders like Cloudflare challenge AI at every gateway. What are our options?
May 13
-
Side Projects.
May 13
-
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)
May 13
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.