r/LocalLLaMA · May 27, 2026 · 3 min read

KV cache quant benchmarks: q5 & q6 are underrated, q8/q4 is bad, TCQ has a niche

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Here's my article with 38 quant pairs thoroughly benchmarked in KLD with 3 different Qwen 3.6 27B configs: Q5_K_S + 64k context, IQ4_XS + 64k context, IQ4_XS + 128k context. This allows us to track not only how cache quantizations affects the precision in a vacuum, but also how it interacts with noise from the model itself.

All benchmarks were done using my BeeLlama.cpp fork, allowing to include a number of quant types that are not present in mainline llama.cpp: vanilla TurboQuant, TCQ 3-bit/2-bit, and q6_0.

https://anbeeld.com/articles/kv-cache-quantization-benchmarks-for-long-context

TL;DR

q5_0 KV is underrated, and same for q5_1 as V cache. Both really don't get the attention they deserve. Data shows they provide solid mid-range performance without being as heavy as q8_0 nor as shitty as q4_0.
q8_0 / q4_* is overrated. Strong K does not fully rescue weak V, and those pairs are too unbalanced and perform worse than the community reputation suggests.
Prefer sane KV quants over wasting VRAM on bf16 cache for heavily quantized weights. A Q4/IQ4 model with full bf16 KV looks like the wrong trade to me, and both draw from the same VRAM pool so you might want to balance them better.
Practical ladder: q8_0 / q6_0 or q8_0 / q5_1 for high-end, q6_0 / q5_0 for extra headroom, q5_0 / q5_0 or q5_0 / q4_1 when VRAM is tight, q4_0 / q4_0 only if no other options allow to fit the desired context.
TurboQuant is confirmed to be useful only as extreme compression. turbo3_tcq is the only type with decent quality per size, turbo4 is basically useless while also being slow.

KLD results on Q5_K_S + 64k context

The rest of benchmark data and in-depth analysis are available in the article.

Cache	Size	Mean KLD	Mean precision	99.9% KLD	99.9% precision	Tok/s
bf16	100.0%	0.000375	100.00%	0.023258	100.00%	850.81
q8_0	53.1%	0.002328	99.80%	0.078709	94.61%	851.11
q8_0-q6_0	46.9%	0.002499	99.79%	0.081616	94.33%	848.78
q8_0-q5_1	45.3%	0.002529	99.78%	0.082880	94.21%	828.63
q8_0-q5_0	43.8%	0.002656	99.77%	0.088486	93.69%	847.33
q8_0-q4_1	42.2%	0.003080	99.73%	0.099080	92.70%	786.54
q8_0-q4_0	40.6%	0.003316	99.71%	0.104680	92.18%	849.37
q6_0	40.6%	0.002614	99.78%	0.090800	93.47%	845.96
q8_0-turbo4	39.5%	0.003561	99.68%	0.103041	92.33%	838.90
q6_0-q5_1	39.1%	0.002781	99.76%	0.090447	93.50%	846.24
q5_1	37.5%	0.002911	99.75%	0.098354	92.77%	841.65
q6_0-q5_0	37.5%	0.002820	99.76%	0.092682	93.29%	846.86
q8_0-turbo3_tcq	36.7%	0.005090	99.53%	0.149387	88.15%	817.57
q6_0-q4_1	35.9%	0.003312	99.71%	0.104582	92.19%	848.42
q5_0	34.4%	0.003206	99.72%	0.099073	92.70%	849.79
q5_1-q4_1	34.4%	0.003380	99.70%	0.095011	93.08%	846.27
q6_0-q4_0	34.4%	0.003288	99.71%	0.111566	91.55%	848.24
q6_0-turbo4	33.2%	0.003748	99.66%	0.107377	91.93%	837.77
q5_0-q4_1	32.8%	0.003471	99.69%	0.099618	92.65%	847.59
q5_1-q4_0	32.8%	0.003626	99.68%	0.108649	91.82%	846.91
q4_1	31.3%	0.004476	99.59%	0.141813	88.82%	854.33
q5_0-q4_0	31.3%	0.003581	99.68%	0.113332	91.39%	847.64
q6_0-turbo3_tcq	30.5%	0.005379	99.50%	0.154680	87.68%	819.23
q5_0-turbo4	30.1%	0.003812	99.66%	0.112249	91.49%	837.52
q5_1-turbo3_tcq	28.9%	0.005594	99.48%	0.144591	88.57%	816.05
q4_0	28.1%	0.004711	99.57%	0.130419	89.84%	855.08
q5_0-turbo3_tcq	27.3%	0.005471	99.49%	0.158514	87.35%	815.80
q5_0-turbo3	27.0%	0.007097	99.33%	0.192428	84.44%	837.90
q4_1-turbo3_tcq	25.8%	0.006184	99.42%	0.174831	85.94%	816.95
turbo4	25.8%	0.004760	99.55%	0.138370	89.13%	705.32
q4_0-turbo3_tcq	24.2%	0.006269	99.41%	0.186572	84.93%	821.89
q4_0-turbo3	23.8%	0.008235	99.22%	0.222154	81.96%	839.29
q4_0-turbo2_tcq	21.1%	0.015168	98.53%	0.395244	68.94%	826.07
turbo3_tcq	20.3%	0.007978	99.24%	0.227104	81.56%	795.20
turbo3	19.5%	0.011181	98.93%	0.296060	76.12%	836.75
turbo3_tcq-turbo2_tcq	17.2%	0.016386	98.41%	0.437043	66.11%	796.16
turbo3-turbo2	16.4%	0.023985	97.67%	0.605087	55.89%	831.88
turbo2_tcq	14.1%	0.023073	97.76%	0.632401	54.38%	807.25
turbo2	13.3%	0.036230	96.48%	0.903576	41.47%	842.29

submitted by /u/Anbeeld
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA