r/LocalLLaMA · · 1 min read

Qwen 3.6 27B KV cache quant benchmarks: 75 pairs, q8/q6/q5/q4, KVarN, Turbo/TCQ

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Qwen 3.6 27B KV cache quant benchmarks: 75 pairs, q8/q6/q5/q4, KVarN, Turbo/TCQ

Full benchmark results and in-depth analysis are available in the articles: KV Cache Quantization Benchmarks for Long Context and KVarN KV Cache: Implementation and Benchmarks.

BeeLlama.cpp (my llama.cpp fork) was used as inference engine due to support of additional types: KVarN (as of v0.3.2 Preview), q6_0, TurboQuant, and TCQ.

submitted by /u/Anbeeld
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA