Is there a big gap between Q4 and Q6 on Qwen3.6?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I’ve got one 3090 and thanks to the help of MTP and all, I can do around 65 tok/s on qwen 3.6 dense 27b. But I’m running at Q4_M so everything fits and my context isn’t super high. Maybe 65k or up to 100k.
I’ve thrown around the idea of a second 3090. But I do already have some gaming PCs running parallel stuff with smaller 3080 (2x) and 4080S cards to support my 3090. So it seems the real benefit of a second 3090 is running at a higher quant.
But for those that do, have you noticed a big difference?
[link] [comments]
More from r/LocalLLaMA
-
A First Comprehensive Study of TurboQuant: Accuracy and Performance
May 14
-
NVIDIA Reportedly Prepares RTX 5090 Price Hike Amid Rising GDDR7 Costs (maybe RTX 50 and PRO series as well)
May 14
-
Introducing cyankiwi AWQ 4-bit Quantization — 26.05 update
May 14
-
I tracked EU GPU prices across 15 stores for 50+ days - RTX 5090 is the only card not dropping in price
May 14
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.