Qwen 3.6 benchmarks on 2x RTX PRO 6000
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Got a chance to play around with 2x RTX PRO 6000 setup so sharing some number for Qwen 3.6.
All these were run using latest stable VLLM backend. This was for a personal project.
Qwen 3.6 27B BF16 (Original without any quantization)
------
MTP - Off | 64 concurrency | 1600 tps generation
MTP - 2 | 32 concurrency | 1400 tps generation
MTP - 2 | 64 concurrency | 1800 tps generation
------
Qwen 3.6 35B BF16
MTP - Off | 64 concurrency | 2700 tps generation
MTP - Off | 128 concurrency | 3500 tps generation (Prompt Processing 30,000 tps)
[link] [comments]
More from r/LocalLLaMA
-
We added W8A8 activation quantization to MLX — prefill went from 2.84s to 2.52s on M5 Pro
May 25
-
I made a local-first MCP tutorial repo with node-llama-cpp and a custom agent loop
May 25
-
NVIDIA Jetson AGX Orin 64GB
May 25
-
server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp
May 25
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.