Does this idea sound fun? [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
It's about inference-time learning by inserting some experts specialized for updating sibling expert weights in MoE. All the components needed were already there, but no one tried it inside MoE, so I did a small PoC. It kinda worked. I'd love to hear what you think.
[link] [comments]
More from r/MachineLearning
-
I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]
May 21
-
Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]
May 21
-
Looking for real world comparisons between WALL OSS pi0.6 and OpenVLA[D]
May 21
-
Columbia Machine Learning Summer School (MLSS) 2026 [D]
May 21
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.