Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
We've got great outputs for 27B via club 3090, but what about those of us who love the blazing speed of 35B on dual 3090s?
I was getting 1500 p/p and 120 t/g with split layers, but MTP slowed it down to 80 t/g when I tested last week. I'm sticking with my CPU overflow fallback of 3500 p/p and 80 t/g until someone cooks up something ala the geniuses over at club 3090.
What have you tried so far with the new llama.cpp MTP merge? Any big jump over your previous best build for 35B?
[link] [comments]
More from r/LocalLLaMA
-
Qwen3.5-122B-Q5-MTP - Qwen3.5-122B-Q6-MTP
May 16
-
I fitted the new δ-mem research for apple silicon using mlx and openclaw integration! My findings
May 16
-
gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now, A Writing Finetune that Aims to Improve Gemma 4 31B it Writing Quality with More Natural English and Better Prose, Good for Creative Writings, Translations and RPs!
May 16
-
Local Qwen 3.6 vs frontier models on a coding primitive: single-file HTML canvas driving animation - results and GIFs
May 16
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.