r/LocalLLaMA · · 1 min read

Mutating Gemma 4 31B Dense in to a native Gemma 4 additive-MoE model

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

I recently came across an interesting model on Hugginface from JDONE-Research/AIOne-Agent-52B-A36B-it. It is the first finetune I saw that is built on the Gemma 4 31B dense model but enables MoE for it, training a router + experts and enabling the enable_moe_block config like Gemma 4 26B does. I was surprised that this "feature" hasn't been discussed more, since I thought it might be an interesting architecture to further post-train the Gemma 4 31B model to update its knowledge and give it enhanced capabilities through MoE.

Unfortunately, the JDONE finetune is korean specific, but I was curious if anybody in the community has come across or explored similar Gemma 4 31B-based models extended with MoE. I had some spare RunPod credits so I worked iteratively with ChatGPT Pro to create a training script that would take around 24hrs to complete on a B300 to create a proof-of-concept model to see if I could actually create a working model with this augmented architecture. I have pretty little experience doing full training on models (only done finetuning a couple of times through Unsloth), so if anyone with more experience than I has suggestions, I'm very open to feedback!

submitted by /u/SemaMod
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA