r/LocalLLaMA · · 1 min read

Qwen 3.6 35B-A3B @ Q4 or Gemma 4 12B @ Q8?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Wondering how much model quantization matters here. Daily driver on my 32gb unified memory setup is the qwen model outputting ~15 tokens a second.

Heard good things about the 12B Gemma 4 model so interested in trying it against my codebase. Given its size I can very comfortably fit the Q8 in. Hell, I could probably run it at BF16 lol

submitted by /u/mailto_devnull
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA