Hugging Face Daily Papers · May 27, 2026 · 3 min read

MobileMoE: Scaling On-Device Mixture of Experts

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Seems really interesting and promising on mobile devices.</p>\n","updatedAt":"2026-05-27T03:28:49.092Z","author":{"_id":"62b6b0397523238923221df9","avatarUrl":"/avatars/77068771dd51df7519516cd502a88789.svg","fullname":"Jiasenlu","name":"Jiasenlu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.8499436378479004},"editors":["Jiasenlu"],"editorAvatarUrls":["/avatars/77068771dd51df7519516cd502a88789.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.27358","authors":[{"_id":"6a16648ae9aa3c8e322db4cb","user":{"_id":"6642a43dccd85d80ae016fa6","avatarUrl":"/avatars/5cc443ebfcd7a92def3595bffb92e6ba.svg","isPro":false,"fullname":"Yanbei Chen","user":"yanbeic","type":"user","name":"yanbeic"},"name":"Yanbei Chen","status":"claimed_verified","statusLastChangedAt":"2026-05-27T07:41:03.198Z","hidden":false},{"_id":"6a16648ae9aa3c8e322db4cc","name":"Hanxian Huang","hidden":false},{"_id":"6a16648ae9aa3c8e322db4cd","name":"Ernie Chang","hidden":false},{"_id":"6a16648ae9aa3c8e322db4ce","name":"Jacob Szwejbka","hidden":false},{"_id":"6a16648ae9aa3c8e322db4cf","name":"Digant Desai","hidden":false},{"_id":"6a16648ae9aa3c8e322db4d0","name":"Zechun Liu","hidden":false},{"_id":"6a16648ae9aa3c8e322db4d1","name":"Vikas Chandra","hidden":false},{"_id":"6a16648ae9aa3c8e322db4d2","name":"Raghuraman Krishnamoorthi","hidden":false}],"publishedAt":"2026-05-26T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"MobileMoE: Scaling On-Device Mixture of Experts","submittedOnDailyBy":{"_id":"62b6b0397523238923221df9","avatarUrl":"/avatars/77068771dd51df7519516cd502a88789.svg","isPro":false,"fullname":"Jiasenlu","user":"Jiasenlu","type":"user","name":"Jiasenlu"},"summary":"Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters (0.3-0.9B active and 1.3-5.3B total) that establish a new Pareto frontier for on-device LLMs. We first formulate an on-device MoE scaling law that jointly optimizes MoE architecture under mobile memory and compute constraints, identifying an on-device sweet spot - moderate sparsity with fine-grained and shared experts - that is simultaneously memory and compute-optimal. Building on the derived architectures, we train MobileMoE with a four-stage recipe covering pre-training, mid-training, instruction fine-tuning, and quantization-aware training, all on open-source datasets. Across 14 benchmarks, MobileMoE matches or exceeds leading on-device dense LLMs with 2-4times fewer inference FLOPs, and matches or surpasses the state-of-the-art MoE OLMoE-1B-7B with up to 60% fewer parameters. To bridge the last mile to mobile deployment, we provide the first efficient MoE inference on commodity smartphones with comprehensive on-device profiling. At comparable INT4 weight memory, MobileMoE-S delivers 1.8-3.8times faster prefill and 2.2-3.4times faster decode than the dense baseline MobileLLM-Pro.","upvotes":4,"discussionId":"6a16648ae9aa3c8e322db4d3","ai_summary":"MobileMoE introduces efficient on-device Mixture-of-Experts language models with sub-billion parameters that achieve better performance and efficiency compared to dense baselines and existing MoE models.","ai_keywords":["Mixture-of-Experts","on-device deployment","sparse models","fine-grained experts","shared experts","scaling law","pre-training","instruction fine-tuning","quantization-aware training","inference FLOPs","INT4 weight memory","prefill","decode"],"organization":{"_id":"5e63d8713071d5be688861b8","name":"facebook","fullname":"AI at Meta","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1592839207516-noauth.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62b6b0397523238923221df9","avatarUrl":"/avatars/77068771dd51df7519516cd502a88789.svg","isPro":false,"fullname":"Jiasenlu","user":"Jiasenlu","type":"user"},{"_id":"6642a43dccd85d80ae016fa6","avatarUrl":"/avatars/5cc443ebfcd7a92def3595bffb92e6ba.svg","isPro":false,"fullname":"Yanbei Chen","user":"yanbeic","type":"user"},{"_id":"65c4eb7cd1dcbd30d86febec","avatarUrl":"/avatars/001c8f02e8ce794b2c21883628b2da72.svg","isPro":false,"fullname":"free-bit","user":"free-bit","type":"user"},{"_id":"6407e5294edf9f5c4fd32228","avatarUrl":"/avatars/8e2d55460e9fe9c426eb552baf4b2cb0.svg","isPro":false,"fullname":"Stoney Kang","user":"sikang99","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"5e63d8713071d5be688861b8","name":"facebook","fullname":"AI at Meta","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1592839207516-noauth.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.27358.md"}">

Papers

arxiv:2605.27358

MobileMoE: Scaling On-Device Mixture of Experts

Published on May 26

· Submitted by

Jiasenlu on May 27

AI at Meta

Upvote

Authors:

Yanbei Chen ,

Abstract

MobileMoE introduces efficient on-device Mixture-of-Experts language models with sub-billion parameters that achieve better performance and efficiency compared to dense baselines and existing MoE models.

AI-generated summary

Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters (0.3-0.9B active and 1.3-5.3B total) that establish a new Pareto frontier for on-device LLMs. We first formulate an on-device MoE scaling law that jointly optimizes MoE architecture under mobile memory and compute constraints, identifying an on-device sweet spot - moderate sparsity with fine-grained and shared experts - that is simultaneously memory and compute-optimal. Building on the derived architectures, we train MobileMoE with a four-stage recipe covering pre-training, mid-training, instruction fine-tuning, and quantization-aware training, all on open-source datasets. Across 14 benchmarks, MobileMoE matches or exceeds leading on-device dense LLMs with 2-4times fewer inference FLOPs, and matches or surpasses the state-of-the-art MoE OLMoE-1B-7B with up to 60% fewer parameters. To bridge the last mile to mobile deployment, we provide the first efficient MoE inference on commodity smartphones with comprehensive on-device profiling. At comparable INT4 weight memory, MobileMoE-S delivers 1.8-3.8times faster prefill and 2.2-3.4times faster decode than the dense baseline MobileLLM-Pro.

View arXiv page View PDF Add to collection

Community

Jiasenlu

Paper submitter about 8 hours ago

•

edited about 8 hours ago

Seems really interesting and promising on mobile devices.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.27358

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.27358 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.27358 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.27358 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

MobileMoE: Scaling On-Device Mixture of Experts

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers