r/LocalLLaMA · May 24, 2026 · 1 min read

minor speed bump for MTP with Qwen3.6-27B-MTP Q6_K_XL

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I'm on Macbook M5 Max with 128GB RAM

Running a test in openwebui using llama-server (llama.cpp):

unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (non MTP): 19tps
unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (MTP): 22.3tps

So nothing like the massive improvements I hear about. Possibly my own settings though.

both use:

--temp 0.6 --top-p 0.8 --top-k 20 --min-p 0.00 --cache-ram 24576 --batch-size 4096 --ubatch-size 2048

edit: forgot to add that I was using --spec-draft-n-max 2 have changed to 3 and also added --spec-draft-p-min 0.75 and now get 24.5tps (for gen)

Discussion (0)

No comments yet. Sign in and be the first to say something.