r/LocalLLaMA · · 1 min read

MTP is nice and all, but what about PP speeds?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I don't know for the rest of you, but with my setup, as soon as i enable MTP, the PP performance and GPU usage drops significantly for some reason. It's not as much a memory issue for me as it is declining performance.

My setup is: 2x Radeon VII 16gb on ROCm, 1x Rtx3080 8gb Max Q on vulkan. Running Qwen 3.6 27B with KV at Q8. The Radeon VIIs are on 4x PCIe Risers, so maybe it is a bus contention issue?

That said, i also tried going full Vulkan, but that makes it worse by a long shot.

Anyone here that could please explain why that is the case?

submitted by /u/milpster
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA