r/LocalLLaMA · · 1 min read

Gemma 4 QAT seems to respond significantly better to KV cache quantization

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Gemma 4 QAT seems to respond significantly better to KV cache quantization

KLD on wikitext with 16k context

My hardware isn't up to testing 31B, if anyone else feels like investigating it would be interesting

submitted by /u/rima_2711
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA