r/LocalLLaMA · June 3, 2026 · 1 min read

llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I think the dust has settled(95+%) for Qwen3.6/3.5-MTP. After the initial PR, so much optimizations & fixes. Even sometime ago today, there's a MTP related PR got merged & released(b9495). So try this latest version & share your benchmarks t/s*. Great work by u/am17an & other folks.

* - Please share all stuff so it would be useful for others too. Also without particular missing details, benchmarks becomes inaccurate. Also I/We would like to have most optimized full command to get best t/s.

To save your time, just copy your console output with full command(has all important details like model quant, context size, KVCache, fit/ncmoe, MTP, etc.,) & paste here. Sample is below(Not mine, pasting from random thread).

llama-server \ -m ../models/Qwen3.6-35B-A3B-MTP-UD-Q5_K_XL.gguf \ --host 0.0.0.0 \ --port 8080 \ --ctx-size 150000 \ --flash-attn on \ -b 2048 \ -ub 512 \ --cache-type-k q8_0 \ --cache-type-v q8_0 \ --jinja \ --threads 11 \ --threads-batch 11 \ -cram 12288 \ --mlock \ -fit on \ --chat-template-kwargs '{"preserve_thinking": true}' \ --spec-type mtp \ --spec-draft-n-max 3 \ --temp 0.6 \ --top-p 0.95 \ --top-k 20 \ --min-p 0.0 \ -np 1 \ --presence-penalty 0.0 \ --repeat-penalty 1.0 prompt eval time = 128889.09 ms / 26796 tokens (4.81 ms per token, 207.90 tokens per second) eval time = 10969.17 ms / 264 tokens (41.55 ms per token, 24.07 tokens per second) total time = 139858.26 ms / 27060 tokens draft acceptance rate = 0.52614 ( 161 accepted / 306 generated) statistics mtp: #calls(b,g,a) = 6 2811 2305, #gen drafts = 2811, #acc drafts = 2305, #gen tokens = 8433, #acc tokens = 5507, dur(b,g,a) = 0.020, 41478.073, 74.975 ms

submitted by /u/pmttyji
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA