Qwen 27B for planning, Qwen 35B-A3B for execution?
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
My 32GB unified memory setup runs both, though 27B even with MTP is something like 7-10 tok/sec. Usable but not real time by any means. (~18 tok/sec with 35B-A3B)
Would it be worth using 27B to plan long horizon tasks, put together the PLAN.md, and have 35B-A4B iterate over it quickly? I can't load both models together, so I'd swap once the plan is set.
Right now I'm using the latter exclusively but am wondering whether the differences in intelligence are as pronounced as some here say.
[link] [comments]
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.