minor speed bump for MTP with Qwen3.6-27B-MTP Q6_K_XL
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I'm on Macbook M5 Max with 128GB RAM
Running a test in openwebui using llama-server (llama.cpp):
unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (non MTP): 19tps
unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (MTP): 22.3tps
So nothing like the massive improvements I hear about. Possibly my own settings though.
both use:
--temp 0.6 --top-p 0.8 --top-k 20 --min-p 0.00 --cache-ram 24576 --batch-size 4096 --ubatch-size 2048 edit: forgot to add that I was using --spec-draft-n-max 2 have changed to 3 and also added --spec-draft-p-min 0.75 and now get 24.5tps (for gen)
[link] [comments]
More from r/LocalLLaMA
-
TTS Benchmark Comparison (all known TTS up until May 2026)
May 24
-
Anyone down to test this? Just uploaded a model using rys
May 24
-
Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA
May 24
-
Is there any reason for an uncensored model if you have no interest in roleplaying?
May 24
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.