r/LocalLLaMA · June 21, 2026 · 1 min read

Gemma 4 QAT seems to respond significantly better to KV cache quantization

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

KLD on wikitext with 16k context

My hardware isn't up to testing 31B, if anyone else feels like investigating it would be interesting

Discussion (0)

No comments yet. Sign in and be the first to say something.