Finally seeing benefits of MTP after removing GGML_CUDA_ALLREDUCE
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Been fighting this a while, mtp seeing lows at 17 to sometimes 30's and today I went and dug deep and tried so many different configuartions, cmake remakes, you name it. After it all I finally tried removing GGML_CUDA_ALLREDUCE and I finally saw a nice uplift in tps!
Just posting in case anyone see this and find themselves in a similar situation. Didn't occur to me to remove that envar because it's usually considered benficial but once I removed it, whammo!
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.