Vision-Language-Action (VLA) models provide a unified paradigm for robotic manipulation, yet their real-world deployment is often bottlenecked by execution efficiency. While existing efforts predominantly focus on compute-centric efficiency to reduce per-step inference latency, the intrinsic policy efficiency of these models remains largely unexplored. Policy efficiency is fundamentally affected by two factors: the effective executable length of predicted action chunks and the total physical steps required to complete a task.</p>\n<p>We observe that current VLA policies struggle with planning unreliability and action redundancy, suffering from severe prediction degradation at the tail of action chunks and tending to generate unnecessarily redundant physical steps. To address this, we propose PolicyTrim, a reinforcement learning-based post-training framework that extends the reliable action chunk length and reduces redundant physical steps via dynamic horizon exploration and a redundancy-aware step-saving reward.</p>\n<p>Extensive experiments across three benchmarks and three VLA models demonstrate that PolicyTrim improves action chunk utilization by 3× and reduces physical execution steps by 51.4%. Ultimately, our framework delivers up to a 5.83× end-to-end deployment speedup without compromising task success rates.</p>\n","updatedAt":"2026-06-23T03:41:42.135Z","author":{"_id":"64574380f3ef144c0e69d484","avatarUrl":"/avatars/a0a84757cb0bf09c24291803e1389b49.svg","fullname":"Feng Chen","name":"chenfeng1271","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8944094777107239},"editors":["chenfeng1271"],"editorAvatarUrls":["/avatars/a0a84757cb0bf09c24291803e1389b49.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.22540","authors":[{"_id":"6a3a0074fdcd3514343bb578","name":"Xianghui Wang","hidden":false},{"_id":"6a3a0074fdcd3514343bb579","name":"Feng Chen","hidden":false},{"_id":"6a3a0074fdcd3514343bb57a","name":"Wenbo Zhang","hidden":false},{"_id":"6a3a0074fdcd3514343bb57b","name":"Hua Yan","hidden":false},{"_id":"6a3a0074fdcd3514343bb57c","name":"Zixuan Wang","hidden":false},{"_id":"6a3a0074fdcd3514343bb57d","name":"Changsheng Li","hidden":false},{"_id":"6a3a0074fdcd3514343bb57e","name":"Yinjie Lei","hidden":false}],"publishedAt":"2026-06-21T00:00:00.000Z","submittedOnDailyAt":"2026-06-23T00:00:00.000Z","title":"PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models","submittedOnDailyBy":{"_id":"64574380f3ef144c0e69d484","avatarUrl":"/avatars/a0a84757cb0bf09c24291803e1389b49.svg","isPro":false,"fullname":"Feng Chen","user":"chenfeng1271","type":"user","name":"chenfeng1271"},"summary":"Vision-Language-Action (VLA) models provide a unified paradigm for robotic manipulation, yet their real-world deployment is often bottlenecked by execution efficiency. While existing efforts predominantly focus on compute-centric efficiency to reduce per-step inference latency, the intrinsic policy efficiency of these models remains largely unexplored. Policy efficiency is fundamentally affected by two factors, namely the effective executable length of predicted action chunks and the total physical steps required to complete a task. These two factors jointly determine the total number of forward inference calls during execution. We observe that current VLA policies struggle with planning unreliability and action redundancy, suffering from severe prediction degradation at the tail of action chunks and tending to generate unnecessarily redundant physical steps. To address this, we propose PolicyTrim, a reinforcement learning-based post-training framework that extends the reliable action chunk length and reduces redundant physical steps. For reliable chunk extension, we employ a dynamic exploration strategy that explicitly rewards the successful completion of longer executable lengths, progressively pushing the trustworthy prediction horizon to its empirical limit. For step efficiency, we design a redundancy-aware reward that directly favors successful task completions with fewer steps while penalizing unreproducible shortcuts, effectively eliminating redundant physical actions. Extensive experiments across three benchmarks and three VLA models demonstrate that PolicyTrim improves action chunk utilization by 3times and reduces physical execution steps by 51.4\\%. Ultimately, our framework delivers up to a 5.83times end-to-end deployment speedup without compromising task success rates.","upvotes":4,"discussionId":"6a3a0075fdcd3514343bb57f","projectPage":"https://inceptionwang.github.io/PolicyTrim/","githubRepo":"https://github.com/INCEPTIONwang/PolicyTrim","githubRepoAddedBy":"user","ai_summary":"PolicyTrim is a reinforcement learning-based framework that enhances VLA model efficiency by extending reliable action chunk lengths and reducing redundant physical steps through dynamic exploration and redundancy-aware rewards.","ai_keywords":["Vision-Language-Action models","policy efficiency","action chunk length","physical steps","reinforcement learning","dynamic exploration","redundancy-aware reward","action redundancy","planning unreliability","end-to-end deployment"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":2,"organization":{"_id":"694ceee9a624c194a184a446","name":"sichuandaxue","fullname":"sichuan university","avatar":"https://www.gravatar.com/avatar/5e60a73f8fffbd81ddff7ef49745506c?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64574380f3ef144c0e69d484","avatarUrl":"/avatars/a0a84757cb0bf09c24291803e1389b49.svg","isPro":false,"fullname":"Feng Chen","user":"chenfeng1271","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"65f3d7ebc2d214f88485bc7d","avatarUrl":"/avatars/d5724567e69e39ec557045a2da237bdd.svg","isPro":false,"fullname":"RagMaster","user":"ragmaster1","type":"user"},{"_id":"68528c5e986951cb905db60f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/izMe6iwDtAUfDJHAqRkt6.png","isPro":false,"fullname":"Maya Jain","user":"maya203","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"694ceee9a624c194a184a446","name":"sichuandaxue","fullname":"sichuan university","avatar":"https://www.gravatar.com/avatar/5e60a73f8fffbd81ddff7ef49745506c?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.22540.md","query":{}}">
PolicyTrim: Boosting Intrinsic Policy Efficiency of Vision-Language-Action Models
Abstract
PolicyTrim is a reinforcement learning-based framework that enhances VLA model efficiency by extending reliable action chunk lengths and reducing redundant physical steps through dynamic exploration and redundancy-aware rewards.
Vision-Language-Action (VLA) models provide a unified paradigm for robotic manipulation, yet their real-world deployment is often bottlenecked by execution efficiency. While existing efforts predominantly focus on compute-centric efficiency to reduce per-step inference latency, the intrinsic policy efficiency of these models remains largely unexplored. Policy efficiency is fundamentally affected by two factors, namely the effective executable length of predicted action chunks and the total physical steps required to complete a task. These two factors jointly determine the total number of forward inference calls during execution. We observe that current VLA policies struggle with planning unreliability and action redundancy, suffering from severe prediction degradation at the tail of action chunks and tending to generate unnecessarily redundant physical steps. To address this, we propose PolicyTrim, a reinforcement learning-based post-training framework that extends the reliable action chunk length and reduces redundant physical steps. For reliable chunk extension, we employ a dynamic exploration strategy that explicitly rewards the successful completion of longer executable lengths, progressively pushing the trustworthy prediction horizon to its empirical limit. For step efficiency, we design a redundancy-aware reward that directly favors successful task completions with fewer steps while penalizing unreproducible shortcuts, effectively eliminating redundant physical actions. Extensive experiments across three benchmarks and three VLA models demonstrate that PolicyTrim improves action chunk utilization by 3times and reduces physical execution steps by 51.4\%. Ultimately, our framework delivers up to a 5.83times end-to-end deployment speedup without compromising task success rates.
Community
Vision-Language-Action (VLA) models provide a unified paradigm for robotic manipulation, yet their real-world deployment is often bottlenecked by execution efficiency. While existing efforts predominantly focus on compute-centric efficiency to reduce per-step inference latency, the intrinsic policy efficiency of these models remains largely unexplored. Policy efficiency is fundamentally affected by two factors: the effective executable length of predicted action chunks and the total physical steps required to complete a task.
We observe that current VLA policies struggle with planning unreliability and action redundancy, suffering from severe prediction degradation at the tail of action chunks and tending to generate unnecessarily redundant physical steps. To address this, we propose PolicyTrim, a reinforcement learning-based post-training framework that extends the reliable action chunk length and reduces redundant physical steps via dynamic horizon exploration and a redundancy-aware step-saving reward.
Extensive experiments across three benchmarks and three VLA models demonstrate that PolicyTrim improves action chunk utilization by 3× and reduces physical execution steps by 51.4%. Ultimately, our framework delivers up to a 5.83× end-to-end deployment speedup without compromising task success rates.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.22540 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.22540 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.22540 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.