1000 tps generation on Qwen3.6 27B with V100s
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| I wanted to see what the absolute best case scenario for generation on this setup was and was not disappointed. 128 concurrent requests is so far removed from what I need but it’s funny to see big number. For single user (batch 1 not 128) the generation is around 80t/s with 3000 t/s processing,no mtp!! [link] [comments] |
More from r/LocalLLaMA
-
Wrote a custom C++ engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B) to bypass framework overhead
May 25
-
opensource music reccomendation / playlist, similar to spotify radio / YT music mix?
May 25
-
Could someone please help explain these results?
May 25
-
llama.cpp has a clever trick for speeding up KV cache decode
May 25
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.