Hugging Face Daily Papers · · 5 min read

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

This paper proposes a new optimizer called Pion, which replaces Muon's uniform spectral whitening with a spectral high-pass filtering mechanism, successfully addressing the performance failures that occur in non-LLM-pretraining scenarios such as Vision-Language-Action (VLA) models and Reinforcement Learning with Verifiable Rewards (RLVR).</p>\n","updatedAt":"2026-05-25T02:38:42.079Z","author":{"_id":"64a832f0fde0456d23232c49","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64a832f0fde0456d23232c49/3ItXrCp7FHYKjmotPXNjr.jpeg","fullname":"Chongyu Fan","name":"a-F1","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8930105566978455},"editors":["a-F1"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64a832f0fde0456d23232c49/3ItXrCp7FHYKjmotPXNjr.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.19282","authors":[{"_id":"6a13b1c94d9e8d8602d20212","name":"Chongyu Fan","hidden":false},{"_id":"6a13b1c94d9e8d8602d20213","name":"Gaowen Liu","hidden":false},{"_id":"6a13b1c94d9e8d8602d20214","name":"Mingyi Hong","hidden":false},{"_id":"6a13b1c94d9e8d8602d20215","name":"Ramana Rao Kompella","hidden":false},{"_id":"6a13b1c94d9e8d8602d20216","name":"Sijia Liu","hidden":false}],"publishedAt":"2026-05-19T00:00:00.000Z","submittedOnDailyAt":"2026-05-25T00:00:00.000Z","title":"Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR","submittedOnDailyBy":{"_id":"64a832f0fde0456d23232c49","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64a832f0fde0456d23232c49/3ItXrCp7FHYKjmotPXNjr.jpeg","isPro":false,"fullname":"Chongyu Fan","user":"a-F1","type":"user","name":"a-F1"},"summary":"Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.","upvotes":2,"discussionId":"6a13b1ca4d9e8d8602d20217","projectPage":"https://chongyu-fan.netlify.app/posts/pion/","githubRepo":"https://github.com/OPTML-Group/Pion","githubRepoAddedBy":"user","ai_summary":"Muon's spectral whitening approach in LLM pretraining is replaced by Pion, which uses a high-pass NS iteration to stabilize training in low-rank and low-SNR regimes while maintaining computational efficiency and supporting per-head updates.","ai_keywords":["Muon","Newton-Schulz iterations","spectral gradient orthogonalization","singular values","cross-modality vision-language-action","reinforcement learning with verifiable rewards","spectral high-pass effect","per-head mode","attention heads","VLA-Adapter","VLANeXt","DROID setup","GRPO","GMPO"],"githubStars":3,"organization":{"_id":"65d67c082eddc839260e703c","name":"OPTML-Group","fullname":"OPTML Group @ MSU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64377851cd93f4c9a34d6153/4yrow7z14uScOsKYfYal7.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64a832f0fde0456d23232c49","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64a832f0fde0456d23232c49/3ItXrCp7FHYKjmotPXNjr.jpeg","isPro":false,"fullname":"Chongyu Fan","user":"a-F1","type":"user"},{"_id":"633e570be7d5ce7bfe037a53","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/633e570be7d5ce7bfe037a53/zV8ULv4Mu7YIGZ8D3JtmK.jpeg","isPro":false,"fullname":"Zhaocheng Liu","user":"zhaocheng","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65d67c082eddc839260e703c","name":"OPTML-Group","fullname":"OPTML Group @ MSU","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64377851cd93f4c9a34d6153/4yrow7z14uScOsKYfYal7.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.19282.md"}">
Papers
arxiv:2605.19282

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Published on May 19
· Submitted by
Chongyu Fan
on May 25
Authors:
,
,
,
,

Abstract

Muon's spectral whitening approach in LLM pretraining is replaced by Pion, which uses a high-pass NS iteration to stabilize training in low-rank and low-SNR regimes while maintaining computational efficiency and supporting per-head updates.

AI-generated summary

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

Community

Paper submitter about 8 hours ago

This paper proposes a new optimizer called Pion, which replaces Muon's uniform spectral whitening with a spectral high-pass filtering mechanism, successfully addressing the performance failures that occur in non-LLM-pretraining scenarios such as Vision-Language-Action (VLA) models and Reinforcement Learning with Verifiable Rewards (RLVR).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.19282
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.19282 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.19282 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.19282 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers