r/LocalLLaMA · · 1 min read

cyankiwi AWQ 4-bit — 26.05 update, NVFP4 + FP8 Dynamic quantization and benchmarks across Qwen3.6 4-bit quants

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

cyankiwi AWQ 4-bit — 26.05 update, NVFP4 + FP8 Dynamic quantization and benchmarks across Qwen3.6 4-bit quants

We are happy to share cyankiwi AWQ update: better AWQ implementation, now with NVFP4 and FP8 Dynamic quantization support. We measured KL divergence against the BF16 baseline for 4-bit Qwen3.6 quants, on synthesized Qwen3.6 BF16 GPQA Diamond responses.

cyankiwi AWQ release comes out lowest on both the 27B dense and the 35B-A3B MoE.

Qwen3.6-27B (dense)

Model Weight size KLD
Lorbus/Qwen3.6-27B-int4-AutoRound 17.69 GiB 0.031682
Intel/Qwen3.6-27B-int4-AutoRound 17.69 GiB 0.032569
sakamakismile/Qwen3.6-27B-NVFP4 18.36 GiB 0.092948
rdtand/Qwen3.6-27B-PrismaSCOUT-Blackwell-NVFP4-BF16-vllm 18.79 GiB 0.040911
cyankiwi/Qwen3.6-27B-AWQ-INT4 19.04 GiB 0.020443
berkerdooo/Qwen3.6-27B-NVFP4 19.15 GiB 0.043821
ocicek/Qwen3.6-27B-NVFP4 19.15 GiB 0.092993
QuantTrio/Qwen3.6-27B-AWQ 20.35 GiB 0.034925
unsloth/Qwen3.6-27B-NVFP4 24.57 GiB 0.039140
QuantTrio/Qwen3.6-27B-AWQ-6Bit 25.79 GiB 0.028084
cyankiwi/Qwen3.6-27B-AWQ-BF16-INT4 26.37 GiB 0.018299
cyankiwi/Qwen3.6-27B-AWQ-BF16-NVFP4 26.59 GiB 0.032549

Qwen3.6-35B-A3B (MoE)

Model Weight size KLD
Intel/Qwen3.6-35B-A3B-int4-mixed-AutoRound 20.02 GiB 0.032453
rdtand/Qwen3.6-35B-A3B-PrismaQuant-4.75bit-vllm 21.31 GiB 0.036303
nvidia/Qwen3.6-35B-A3B-NVFP4 21.82 GiB 0.029490
unsloth/Qwen3.6-35B-A3B-NVFP4 22.99 GiB 0.052754
cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit 23.25 GiB 0.017126
RedHatAI/Qwen3.6-35B-A3B-NVFP4 23.32 GiB 0.046624
QuantTrio/Qwen3.6-35B-A3B-AWQ 23.71 GiB 0.020767
cyankiwi/Qwen3.6-35B-A3B-AWQ-NVFP4 23.86 GiB 0.026335

Qwen3.6 KLD

submitted by /u/_cpatonn
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA