Anyone else running one of the pre-release branches of MTP support to maintain the higher speeds?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I cant help myself its ~20% faster for me, I took the highest speed branch(for me), added the vision fix, and am just riding it out for now
Dual Xeon 8268, 1.5t 2666, Tesla T4
~122eval ~38t/s out
i tried using the release today and during some light coding lamma.cpp crashed and the model restarted, and I didn't experience any crashes on the pre-release versions personally so I jumped back into it
on the actual release branch now I get ~110eval ~30t/s out
just curious what everyone else is doing and if there were any major downsides on the early builds, anyone is aware of
[link] [comments]
More from r/LocalLLaMA
-
Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?
May 16
-
Qwen3.5-122B-Q5-MTP - Qwen3.5-122B-Q6-MTP
May 16
-
I fitted the new δ-mem research for apple silicon using mlx and openclaw integration! My findings
May 16
-
gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs!
May 16
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.