r/LocalLLaMA · · 1 min read

Qwen 3.6 benchmarks on 2x RTX PRO 6000

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Got a chance to play around with 2x RTX PRO 6000 setup so sharing some number for Qwen 3.6.
All these were run using latest stable VLLM backend. This was for a personal project.

Qwen 3.6 27B BF16 (Original without any quantization)

------

MTP - Off | 64 concurrency | 1600 tps generation

MTP - 2 | 32 concurrency | 1400 tps generation

MTP - 2 | 64 concurrency | 1800 tps generation

------

Qwen 3.6 35B BF16

MTP - Off | 64 concurrency | 2700 tps generation

MTP - Off | 128 concurrency | 3500 tps generation (Prompt Processing 30,000 tps)

submitted by /u/mxforest
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA