r/LocalLLaMA · · 1 min read

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar

Under $1000 for 32gb vram from 2023, and ~300 watts draw... and this thing is outperforming the latest pick-your-vendor $5k mini pcs from 2026.

So.. next question is can I make it squeeze 150 t/s with the same q4xl on cuda 13.3 this weekend. Anyone try it yet?

submitted by /u/Chuyito
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA