MTP is nice and all, but what about PP speeds?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I don't know for the rest of you, but with my setup, as soon as i enable MTP, the PP performance and GPU usage drops significantly for some reason. It's not as much a memory issue for me as it is declining performance.
My setup is: 2x Radeon VII 16gb on ROCm, 1x Rtx3080 8gb Max Q on vulkan. Running Qwen 3.6 27B with KV at Q8. The Radeon VIIs are on 4x PCIe Risers, so maybe it is a bus contention issue?
That said, i also tried going full Vulkan, but that makes it worse by a long shot.
Anyone here that could please explain why that is the case?
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.