Hugging Face Daily Papers · June 11, 2026 · 6 min read

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We propose a redesign of the MoE router using Power Iteration during forward pass to couple router weights and expert parameters within the singular space of the parameters. We contend that this imposes an explicit constraint that forces router weights to better reflect the parametric characteristics of the expert weights, resulting in optimized expert routing. Our initial results and extensive analysis validate the effectiveness of this design. We hope our work inspires researchers to rethink MoE routers and leads to more valuable insights for future router designs.\n","updatedAt":"2026-06-11T12:28:47.153Z","author":{"_id":"662aa42f4eaa187e4cf6827b","avatarUrl":"/avatars/17139f0b6e8092cf4c135028db03a7ff.svg","fullname":"Songhao Wu","name":"shwu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.8728195428848267},"editors":["shwu"],"editorAvatarUrls":["/avatars/17139f0b6e8092cf4c135028db03a7ff.svg"],"reactions":[],"isReport":false}},{"id":"6a2aa6a561e25c785a0b4aeb","author":{"_id":"6960eca92f7ad9b043b5cbe0","avatarUrl":"/avatars/e68dcc7fd04f143d849d40414866e633.svg","fullname":"Noah","name":"noahml","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-11T12:14:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is a neat approach to MoE routing. I like the idea of moving away from arbitrary router weights and instead using the principal singular direction of the experts to guide the selection process. It feels like a much more grounded way to define token-expert affinity than how most models currently handle it.\n\nSince this uses a Power-then-Retract paradigm, how much of a computational overhead does this add during the training loop compared to standard routing?\n\nI made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:\nhttps://researchpod.app/episode/b091d9ea-bfd5-4ea9-bced-18546d1f87e4","html":"This is a neat approach to MoE routing. I like the idea of moving away from arbitrary router weights and instead using the principal singular direction of the experts to guide the selection process. It feels like a much more grounded way to define token-expert affinity than how most models currently handle it.\nSince this uses a Power-then-Retract paradigm, how much of a computational overhead does this add during the training loop compared to standard routing?\nI made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go: <a href=\"https://researchpod.app/episode/b091d9ea-bfd5-4ea9-bced-18546d1f87e4\" rel=\"nofollow\">https://researchpod.app/episode/b091d9ea-bfd5-4ea9-bced-18546d1f87e4</a>\n","updatedAt":"2026-06-11T12:14:29.166Z","author":{"_id":"6960eca92f7ad9b043b5cbe0","avatarUrl":"/avatars/e68dcc7fd04f143d849d40414866e633.svg","fullname":"Noah","name":"noahml","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9278668761253357},"editors":["noahml"],"editorAvatarUrls":["/avatars/e68dcc7fd04f143d849d40414866e633.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.12397","authors":[{"_id":"6a2a33e080a9c7c6830c0fc5","user":{"_id":"662aa42f4eaa187e4cf6827b","avatarUrl":"/avatars/17139f0b6e8092cf4c135028db03a7ff.svg","isPro":false,"fullname":"Songhao Wu","user":"shwu","type":"user","name":"shwu"},"name":"Songhao Wu","status":"claimed_verified","statusLastChangedAt":"2026-06-11T08:38:11.792Z","hidden":false},{"_id":"6a2a33e080a9c7c6830c0fc6","user":{"_id":"64b8ca3c5067873176d4b436","avatarUrl":"/avatars/b659d147b2454b47c9a7e89bbed525fc.svg","isPro":false,"fullname":"AngLv","user":"AngLv","type":"user","name":"AngLv"},"name":"Ang Lv","status":"claimed_verified","statusLastChangedAt":"2026-06-11T08:38:09.713Z","hidden":false},{"_id":"6a2a33e080a9c7c6830c0fc7","name":"Ruobing Xie","hidden":false},{"_id":"6a2a33e080a9c7c6830c0fc8","name":"Yankai Lin","hidden":false}],"publishedAt":"2026-06-10T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"Redesign Mixture-of-Experts Routers with Manifold Power Iteration","submittedOnDailyBy":{"_id":"662aa42f4eaa187e4cf6827b","avatarUrl":"/avatars/17139f0b6e8092cf4c135028db03a7ff.svg","isPro":false,"fullname":"Songhao Wu","user":"shwu","type":"user","name":"shwu"},"summary":"Router is the cornerstone component to the Mixture-of-Experts models. Serving as expert proxies, the rows of the router matrix compute their similarity to the MoE inputs to determine which subset of experts is activated. Ideally, each router row is designed to encode the expert matrix into this representative vector, such that its dot-product with token can better reflect token-expert affinity. However, there exists no design principles to enforce this condensation. In this paper, we propose to align each router row with the principal singular direction of the associated expert, as this direction provides the most expressive mathematical description of a matrix. Based on this principle, we propose a router redesign with Manifold Power Iteration (MPI). Specifically, it introduces a \"Power-then-Retract\" paradigm, where a power iteration step is performed on the router weights, followed by a retraction to impose a norm constraint to ensure both efficiency and stability. Theoretically, we show that MPI drives router rows to converge toward the principal singular directions of associated experts. Empirically, we pretrain MoE model across scales from 1B to 11B parameters to confirm that this alignment facilitates more effective MoE models.","upvotes":74,"discussionId":"6a2a33e080a9c7c6830c0fc9","githubRepo":"https://github.com/ericshwu/Router-with-Manifold-Power-Iteration","githubRepoAddedBy":"user","ai_summary":"Researchers propose a novel router redesign for Mixture-of-Experts models that aligns router rows with the principal singular directions of expert matrices using Manifold Power Iteration to improve model effectiveness.","ai_keywords":["Mixture-of-Experts","router","expert proxies","router matrix","singular value decomposition","Manifold Power Iteration","power iteration","retraction","principal singular direction","expert matrix"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":4},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"662aa42f4eaa187e4cf6827b","avatarUrl":"/avatars/17139f0b6e8092cf4c135028db03a7ff.svg","isPro":false,"fullname":"Songhao Wu","user":"shwu","type":"user"},{"_id":"627a124ffe55fa0f8ce0eaf7","avatarUrl":"/avatars/41e0dc029faed6dc45d620c5fe2652a5.svg","isPro":false,"fullname":"Serendipity","user":"Yuhan","type":"user"},{"_id":"655dd12bdcb845354c1990a3","avatarUrl":"/avatars/9001fc7d08d09df59d01608b11e59252.svg","isPro":false,"fullname":"Tan","user":"RiccardTo","type":"user"},{"_id":"698ab2ebc9804eab58756f66","avatarUrl":"/avatars/797aa01a039a42671b8140c7742c71a5.svg","isPro":false,"fullname":"ShuqiYe","user":"ShuqiYe","type":"user"},{"_id":"67e244909fee6aa2b9bdeaf8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/BAjL8UtBNdOlQOawHHVUI.png","isPro":false,"fullname":"CentreChen","user":"CentreChen","type":"user"},{"_id":"64bb937d8496ee0fb6cac9aa","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64bb937d8496ee0fb6cac9aa/oFkKNxaMrd3wAciwP4Lu5.png","isPro":false,"fullname":"YijuGuo","user":"YijuGuo","type":"user"},{"_id":"65962d1d5b7d033566daf786","avatarUrl":"/avatars/652180141eb8dd9b30defad05997fdc8.svg","isPro":false,"fullname":"guirong chen","user":"aaaGUI","type":"user"},{"_id":"664c94f71959997352fc1946","avatarUrl":"/avatars/1622bea455771298658578fab24ecee7.svg","isPro":false,"fullname":"Jingwen Chen","user":"cjw259wen","type":"user"},{"_id":"6a268864e5e6e96da5015d39","avatarUrl":"/avatars/443651ac1d0ddf1fc0d857a49f018a7f.svg","isPro":false,"fullname":"James Choi","user":"JamesChoiUp","type":"user"},{"_id":"6a26895230ee6257332c272c","avatarUrl":"/avatars/488dc52106515dcb55aabb378b489b0c.svg","isPro":false,"fullname":"Ethan Wong","user":"EthannWong","type":"user"},{"_id":"68390c1e627dfd60c9e184a2","avatarUrl":"/avatars/d88dcd34b07a33e77878d2371c377bae.svg","isPro":false,"fullname":"MavisWang30","user":"MavisWang","type":"user"},{"_id":"6a268a4ce5e6e96da50177fe","avatarUrl":"/avatars/6cd98aae3fa4b2bb4ba234f8e33bbdef.svg","isPro":false,"fullname":"Henry Li","user":"HenrxyLi","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":1,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.12397.md"}">

Papers

arxiv:2606.12397

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Published on Jun 10

· Submitted by

Songhao Wu on Jun 11

#1 Paper of the day

Upvote

Authors:

Songhao Wu ,

Ang Lv ,

Abstract

Researchers propose a novel router redesign for Mixture-of-Experts models that aligns router rows with the principal singular directions of expert matrices using Manifold Power Iteration to improve model effectiveness.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Router is the cornerstone component to the Mixture-of-Experts models. Serving as expert proxies, the rows of the router matrix compute their similarity to the MoE inputs to determine which subset of experts is activated. Ideally, each router row is designed to encode the expert matrix into this representative vector, such that its dot-product with token can better reflect token-expert affinity. However, there exists no design principles to enforce this condensation. In this paper, we propose to align each router row with the principal singular direction of the associated expert, as this direction provides the most expressive mathematical description of a matrix. Based on this principle, we propose a router redesign with Manifold Power Iteration (MPI). Specifically, it introduces a "Power-then-Retract" paradigm, where a power iteration step is performed on the router weights, followed by a retraction to impose a norm constraint to ensure both efficiency and stability. Theoretically, we show that MPI drives router rows to converge toward the principal singular directions of associated experts. Empirically, we pretrain MoE model across scales from 1B to 11B parameters to confirm that this alignment facilitates more effective MoE models.

View arXiv page View PDF GitHub 4 Add to collection

Community

shwu

Paper author Paper submitter about 15 hours ago

•

edited about 8 hours ago

noahml

about 8 hours ago

This is a neat approach to MoE routing. I like the idea of moving away from arbitrary router weights and instead using the principal singular direction of the experts to guide the selection process. It feels like a much more grounded way to define token-expert affinity than how most models currently handle it.

Since this uses a Power-then-Retract paradigm, how much of a computational overhead does this add during the training loop compared to standard routing?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/b091d9ea-bfd5-4ea9-bced-18546d1f87e4

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.12397

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.12397 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.12397 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.12397 in a Space README.md to link it from this page.

Collections including this paper 2

Discussion (0)

No comments yet. Sign in and be the first to say something.

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 2

Discussion (0)

More from Hugging Face Daily Papers