r/LocalLLaMA · May 23, 2026 · 1 min read

Any reason to run dense over MOE for RAGs?

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I tend to use Claude for a lot of research and I also increasingly worry about things like misinformation or things in the model I can't audit. So, I'm building my own all in one RAG with big datasets like all of Wiki, research papers, all the typical big data sets people like to grab. Then lots of books as well. Then, I do a lot of stuff like claim and argument extraction and such, but I won't get deep into that yet, it's still getting built.

I was using qwen3.6 27b MTP for my inline chat for a while without even considering MOE cause this sub kinda led me to thinking MOE = bad. 27b = king. But, I started doing tests with it and I'm getting much better answers with qwen3.6 35b APEX. It seems to be grabbing way more information, bringing up way more points than what dense was finding. Dense didn't seem to compete hardly really. 150 tok/s is also nicer than 60 tok/s (I'm running a single 3090).

I know people are much more interested in models for coding (believe me, I like it as well), but is there an advantage MOE has over dense for RAG specifically? If anybody even does RAG anymore, information that's not bot driven seems hard to find sometimes.

submitted by /u/vick2djax
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/LocalLLaMA