r/LocalLLaMA · · 1 min read

Is there a big gap between Q4 and Q6 on Qwen3.6?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I’ve got one 3090 and thanks to the help of MTP and all, I can do around 65 tok/s on qwen 3.6 dense 27b. But I’m running at Q4_M so everything fits and my context isn’t super high. Maybe 65k or up to 100k.

I’ve thrown around the idea of a second 3090. But I do already have some gaming PCs running parallel stuff with smaller 3080 (2x) and 4080S cards to support my 3090. So it seems the real benefit of a second 3090 is running at a higher quant.

But for those that do, have you noticed a big difference?

submitted by /u/vick2djax
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA