cyankiwi AWQ 4-bit — 26.05 update, NVFP4 + FP8 Dynamic quantization and benchmarks across Qwen3.6 4-bit quants
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| We are happy to share cyankiwi AWQ update: better AWQ implementation, now with NVFP4 and FP8 Dynamic quantization support. We measured KL divergence against the BF16 baseline for 4-bit Qwen3.6 quants, on synthesized Qwen3.6 BF16 GPQA Diamond responses. cyankiwi AWQ release comes out lowest on both the 27B dense and the 35B-A3B MoE. Qwen3.6-27B (dense)
Qwen3.6-35B-A3B (MoE)
[link] [comments] |
More from r/LocalLLaMA
-
finally
Jun 5
-
Higgs Audio v3 TTS 4B. Built for voice chat. Support 100 languages and inline control.
Jun 4
-
BeeLlama v0.3.1 – latest llama.cpp with extras! DFlash, MTP, q6_0 cache, TurboQuant. Single RTX 3090: Qwen 3.6 27B & Gemma 4 31B up to 177.8 tps (4.93x over baseline)
Jun 4
-
You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter.
Jun 4
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.