r/LocalLLaMA · May 12, 2026 · 1 min read

New Qwen3.6 27b Autoround Quant (int4) Best Recipe

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I've been using the int4 Autoround quant from "Lorbus/Qwen3.6-27B-int4-AutoRound" and it has been pretty good! Great quality and performance on an RTX 5090 vllm.

I decided to use a similar Autoround recipe but use the "autorund-best" preset instead, it uses more iterations to increase the quality. I have created a default version and a code calibrated quant both at int4. Recipe and calibration dataset can be found within the model card.

webhie/Qwen3.6-27B-int4-AutoRound (Best Recipe) webhie/Qwen3.6-27B-int4-AutoRound · Hugging Face

webhie/Qwen3.6-27B-int4-AutoRound-Code (Best Recipe) webhie/Qwen3.6-27B-int4-AutoRound-Code · Hugging Face

Token Generation: 60-80tps (w/o mtp) & 130-160tps (w mtp 3)

Note: This model is extremely sensitive to chat template changes, if you encounter issues (looping, incomplete responses, etc.) with any other Qwen 3.6 model try v11 from here: froggeric/Qwen-Fixed-Chat-Templates · Hugging Face

V11 is included with the HF quant.

submitted by /u/Otherwise-Director17
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA