r/LocalLLaMA · · 1 min read

Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

We've got great outputs for 27B via club 3090, but what about those of us who love the blazing speed of 35B on dual 3090s?

I was getting 1500 p/p and 120 t/g with split layers, but MTP slowed it down to 80 t/g when I tested last week. I'm sticking with my CPU overflow fallback of 3500 p/p and 80 t/g until someone cooks up something ala the geniuses over at club 3090.

What have you tried so far with the new llama.cpp MTP merge? Any big jump over your previous best build for 35B?

submitted by /u/youcloudsofdoom
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA