Hugging Face Daily Papers · · 3 min read

Learning High-Frequency Continuous Action Chunks in Latent Space

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

High-frequency robotic control is improved by using variational autoencoders to enhance temporal and spatial consistency, combined with a reuse-then-refine strategy for smooth real-time execution.</p>\n","updatedAt":"2026-05-27T07:53:54.837Z","author":{"_id":"64ed64f0b1ff0bd44f6d5684","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ed64f0b1ff0bd44f6d5684/lJv-m95lOFV9cjYrfrTHi.jpeg","fullname":"Kunyun Wang","name":"sadpiggy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8765513896942139},"editors":["sadpiggy"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64ed64f0b1ff0bd44f6d5684/lJv-m95lOFV9cjYrfrTHi.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.24931","authors":[{"_id":"6a166f6ee9aa3c8e322db526","user":{"_id":"64ed64f0b1ff0bd44f6d5684","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ed64f0b1ff0bd44f6d5684/lJv-m95lOFV9cjYrfrTHi.jpeg","isPro":false,"fullname":"Kunyun Wang","user":"sadpiggy","type":"user","name":"sadpiggy"},"name":"Kunyun Wang","status":"claimed_verified","statusLastChangedAt":"2026-05-27T07:40:54.850Z","hidden":false},{"_id":"6a166f6ee9aa3c8e322db527","name":"Yuhang Zheng","hidden":false},{"_id":"6a166f6ee9aa3c8e322db528","name":"Yupeng Zheng","hidden":false},{"_id":"6a166f6ee9aa3c8e322db529","name":"Jieru Zhao","hidden":false},{"_id":"6a166f6ee9aa3c8e322db52a","name":"Wenchao Ding","hidden":false}],"publishedAt":"2026-05-24T00:00:00.000Z","submittedOnDailyAt":"2026-05-27T00:00:00.000Z","title":"Learning High-Frequency Continuous Action Chunks in Latent Space","submittedOnDailyBy":{"_id":"64ed64f0b1ff0bd44f6d5684","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ed64f0b1ff0bd44f6d5684/lJv-m95lOFV9cjYrfrTHi.jpeg","isPro":false,"fullname":"Kunyun Wang","user":"sadpiggy","type":"user","name":"sadpiggy"},"summary":"Modern robotic policies increasingly rely on action chunking to execute complex tasks in the physical world. While action chunking improves temporal consistency at moderate action frequencies, it becomes insufficient when the action frequency is further increased (e.g., to 60~Hz). At such high frequencies, policies often fail to generate actions that are both temporally smooth and spatially consistent. We address this challenge by shifting high-frequency action learning from the action space to a latent space with variational autoencoder (VAE). This formulation significantly improves both temporal and spatial consistency of high-frequency control. To enable smooth real-time execution, we further introduce Reuse-then-Refine, a chunk-level refine strategy that improves continuity between adjacent action chunks under asynchronous inference. As a result, robots controlled by our policy can execute complex contact-rich tasks continuously, with less pauses and jerky motions. Experiments on three real-world contact-rich robotic tasks show that our approach consistently completes tasks with smooth motions. Our code and data are available at https://github.com/tars-robotics/RTR.","upvotes":1,"discussionId":"6a166f6ee9aa3c8e322db52b","projectPage":"https://sjtu-zhao-lab.github.io/RTR/","githubRepo":"https://github.com/tars-robotics/RTR","githubRepoAddedBy":"user","ai_summary":"High-frequency robotic control is improved by using variational autoencoders to enhance temporal and spatial consistency, combined with a reuse-then-refine strategy for smooth real-time execution.","ai_keywords":["action chunking","variational autoencoder","high-frequency control","temporal consistency","spatial consistency","Reuse-then-Refine","asynchronous inference","contact-rich tasks"],"githubStars":9},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.24931.md"}">
Papers
arxiv:2605.24931

Learning High-Frequency Continuous Action Chunks in Latent Space

Published on May 24
· Submitted by
Kunyun Wang
on May 27
Authors:
,
,
,

Abstract

High-frequency robotic control is improved by using variational autoencoders to enhance temporal and spatial consistency, combined with a reuse-then-refine strategy for smooth real-time execution.

AI-generated summary

Modern robotic policies increasingly rely on action chunking to execute complex tasks in the physical world. While action chunking improves temporal consistency at moderate action frequencies, it becomes insufficient when the action frequency is further increased (e.g., to 60~Hz). At such high frequencies, policies often fail to generate actions that are both temporally smooth and spatially consistent. We address this challenge by shifting high-frequency action learning from the action space to a latent space with variational autoencoder (VAE). This formulation significantly improves both temporal and spatial consistency of high-frequency control. To enable smooth real-time execution, we further introduce Reuse-then-Refine, a chunk-level refine strategy that improves continuity between adjacent action chunks under asynchronous inference. As a result, robots controlled by our policy can execute complex contact-rich tasks continuously, with less pauses and jerky motions. Experiments on three real-world contact-rich robotic tasks show that our approach consistently completes tasks with smooth motions. Our code and data are available at https://github.com/tars-robotics/RTR.

Community

Paper author Paper submitter about 3 hours ago

High-frequency robotic control is improved by using variational autoencoders to enhance temporal and spatial consistency, combined with a reuse-then-refine strategy for smooth real-time execution.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.24931
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.24931 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.24931 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers