r/LocalLLaMA · · 1 min read

Qwen 27B for planning, Qwen 35B-A3B for execution?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

My 32GB unified memory setup runs both, though 27B even with MTP is something like 7-10 tok/sec. Usable but not real time by any means. (~18 tok/sec with 35B-A3B)

Would it be worth using 27B to plan long horizon tasks, put together the PLAN.md, and have 35B-A4B iterate over it quickly? I can't load both models together, so I'd swap once the plan is set.

Right now I'm using the latter exclusively but am wondering whether the differences in intelligence are as pronounced as some here say.

submitted by /u/mailto_devnull
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA