What's this sub geebral opinion on quantisizing the KV cache
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
*general not whatever that word is.
Assume I'm talking about Qwen3.6b-27b for coding.
I hear a lot about quantisizing the model but almost no opinions on the KV cache for this model.
EDIT: Btw thanks everyone, I'm in awe of how much I learn from this sub every day.
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.