Hugging Face Daily Papers · · 7 min read

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Generating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has driven recent gains in general video quality, extending it to human motion remains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulated human motion, and often assign high scores to videos with floating bodies or physically implausible movements. To address this, we propose PhyMotion, a structured, fine-grained motion reward that grounds recovered 3D human trajectories in a physics simulator and evaluates motion quality along multiple dimensions of physical feasibility. Concretely, we recover SMPL body meshes from generated videos, retarget them onto a humanoid in the MuJoCo physics simulator, and evaluate the resulting motion along three axes: kinematic plausibility, contact and balance consistency, and dynamic feasibility. Each component provides a continuous and interpretable signal tied to a specific aspect of motion quality, allowing the reward to capture which aspects of motion are physically correct or violated. Experiments show that PhyMotion achieves stronger correlation with human judgments than existing reward formulations. These gains carry over to RL-based post-training, where optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive and bidirectional video generators under both automatic metrics and blind human evaluation (+68 Elo gain). Ablations show that the three axes provide complementary supervision signals, while the reward preserves overall video generation quality with only modest training overhead.</p>\n","updatedAt":"2026-05-15T01:53:46.232Z","author":{"_id":"646e86350867c99c2d3f2ecf","avatarUrl":"/avatars/b89798ff623abffb169eacda2ac32fde.svg","fullname":"Han Lin","name":"hanlincs","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9101085662841797},"editors":["hanlincs"],"editorAvatarUrls":["/avatars/b89798ff623abffb169eacda2ac32fde.svg"],"reactions":[],"isReport":false}},{"id":"6a06bd5289940d9a9da195e7","author":{"_id":"6347656ec849ba78f287f87b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665623370422-noauth.jpeg","fullname":"Yidong Huang","name":"6kplus","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-05-15T06:29:38.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"We have opensourced our code at [https://github.com/h6kplus/PhyMotion](https://github.com/h6kplus/PhyMotion)","html":"<p>We have opensourced our code at <a href=\"https://github.com/h6kplus/PhyMotion\" rel=\"nofollow\">https://github.com/h6kplus/PhyMotion</a></p>\n","updatedAt":"2026-05-15T06:29:38.659Z","author":{"_id":"6347656ec849ba78f287f87b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665623370422-noauth.jpeg","fullname":"Yidong Huang","name":"6kplus","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9168943166732788},"editors":["6kplus"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1665623370422-noauth.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14269","authors":[{"_id":"6a067c71b1a8cbabc9f09832","name":"Yidong Huang","hidden":false},{"_id":"6a067c71b1a8cbabc9f09833","name":"Zun Wang","hidden":false},{"_id":"6a067c71b1a8cbabc9f09834","name":"Han Lin","hidden":false},{"_id":"6a067c71b1a8cbabc9f09835","name":"Dong-Ki Kim","hidden":false},{"_id":"6a067c71b1a8cbabc9f09836","name":"Shayegan Omidshafiei","hidden":false},{"_id":"6a067c71b1a8cbabc9f09837","name":"Jaehong Yoon","hidden":false},{"_id":"6a067c71b1a8cbabc9f09838","name":"Jaemin Cho","hidden":false},{"_id":"6a067c71b1a8cbabc9f09839","name":"Yue Zhang","hidden":false},{"_id":"6a067c71b1a8cbabc9f0983a","name":"Mohit Bansal","hidden":false}],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation","submittedOnDailyBy":{"_id":"646e86350867c99c2d3f2ecf","avatarUrl":"/avatars/b89798ff623abffb169eacda2ac32fde.svg","isPro":true,"fullname":"Han Lin","user":"hanlincs","type":"user","name":"hanlincs"},"summary":"Generating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has driven recent gains in general video quality, extending it to human motion remains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulated human motion, and often assign high scores to videos with floating bodies or physically implausible movements. To address this, we propose PhyMotion, a structured, fine-grained motion reward that grounds recovered 3D human trajectories in a physics simulator and evaluates motion quality along multiple dimensions of physical feasibility. Concretely, we recover SMPL body meshes from generated videos, retarget them onto a humanoid in the MuJoCo physics simulator, and evaluate the resulting motion along three axes: kinematic plausibility, contact and balance consistency, and dynamic feasibility. Each component provides a continuous and interpretable signal tied to a specific aspect of motion quality, allowing the reward to capture which aspects of motion are physically correct or violated. Experiments show that PhyMotion achieves stronger correlation with human judgments than existing reward formulations. These gains carry over to RL-based post-training, where optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive and bidirectional video generators under both automatic metrics and blind human evaluation (+68 Elo gain). Ablations show that the three axes provide complementary supervision signals, while the reward preserves overall video generation quality with only modest training overhead.","upvotes":4,"discussionId":"6a067c71b1a8cbabc9f0983b","projectPage":"https://phy-motion.github.io/","githubRepo":"https://github.com/h6kplus/PhyMotion","githubRepoAddedBy":"user","ai_summary":"PhyMotion introduces a physics-grounded reward system for human motion generation that evaluates kinematic plausibility, contact consistency, and dynamic feasibility to improve video quality.","ai_keywords":["reinforcement learning","video generation","human motion","physics simulator","SMPL body meshes","MuJoCo","kinematic plausibility","contact consistency","dynamic feasibility","reward function","autoregressive video generators","bidirectional video generators"],"githubStars":6,"organization":{"_id":"669f9d1fec8789263c0e355a","name":"UNC-ChapelHill","fullname":"University of North Carolina at Chapel Hill","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/669f9c85bd649dba3b88e581/H5uB8_MCewnMtxEUnAvTL.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"652066649004117947e46ed6","avatarUrl":"/avatars/972c97df6f26d2c3d6ce71ec579984bb.svg","isPro":false,"fullname":"Jaehong Yoon","user":"jaehong31","type":"user"},{"_id":"6347656ec849ba78f287f87b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665623370422-noauth.jpeg","isPro":false,"fullname":"Yidong Huang","user":"6kplus","type":"user"},{"_id":"646cd3cdb91221bd20a43fe5","avatarUrl":"/avatars/53e47b1549993d8c04f95e9c60d59a7f.svg","isPro":false,"fullname":"Yue Zhang","user":"Yuezhangjoslin","type":"user"},{"_id":"6622c710b0e5c5e3de8311c1","avatarUrl":"/avatars/a824c150040731679bbd77762ca9d4eb.svg","isPro":false,"fullname":"Zun Wang","user":"ZunWang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"669f9d1fec8789263c0e355a","name":"UNC-ChapelHill","fullname":"University of North Carolina at Chapel Hill","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/669f9c85bd649dba3b88e581/H5uB8_MCewnMtxEUnAvTL.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.14269.md"}">
Papers
arxiv:2605.14269

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Published on May 14
· Submitted by
Han Lin
on May 15
Authors:
,
,
,
,
,
,
,
,

Abstract

PhyMotion introduces a physics-grounded reward system for human motion generation that evaluates kinematic plausibility, contact consistency, and dynamic feasibility to improve video quality.

AI-generated summary

Generating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has driven recent gains in general video quality, extending it to human motion remains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulated human motion, and often assign high scores to videos with floating bodies or physically implausible movements. To address this, we propose PhyMotion, a structured, fine-grained motion reward that grounds recovered 3D human trajectories in a physics simulator and evaluates motion quality along multiple dimensions of physical feasibility. Concretely, we recover SMPL body meshes from generated videos, retarget them onto a humanoid in the MuJoCo physics simulator, and evaluate the resulting motion along three axes: kinematic plausibility, contact and balance consistency, and dynamic feasibility. Each component provides a continuous and interpretable signal tied to a specific aspect of motion quality, allowing the reward to capture which aspects of motion are physically correct or violated. Experiments show that PhyMotion achieves stronger correlation with human judgments than existing reward formulations. These gains carry over to RL-based post-training, where optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive and bidirectional video generators under both automatic metrics and blind human evaluation (+68 Elo gain). Ablations show that the three axes provide complementary supervision signals, while the reward preserves overall video generation quality with only modest training overhead.

Community

Paper submitter about 23 hours ago

Generating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has driven recent gains in general video quality, extending it to human motion remains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulated human motion, and often assign high scores to videos with floating bodies or physically implausible movements. To address this, we propose PhyMotion, a structured, fine-grained motion reward that grounds recovered 3D human trajectories in a physics simulator and evaluates motion quality along multiple dimensions of physical feasibility. Concretely, we recover SMPL body meshes from generated videos, retarget them onto a humanoid in the MuJoCo physics simulator, and evaluate the resulting motion along three axes: kinematic plausibility, contact and balance consistency, and dynamic feasibility. Each component provides a continuous and interpretable signal tied to a specific aspect of motion quality, allowing the reward to capture which aspects of motion are physically correct or violated. Experiments show that PhyMotion achieves stronger correlation with human judgments than existing reward formulations. These gains carry over to RL-based post-training, where optimizing PhyMotion leads to larger and more consistent improvements than optimizing existing rewards, improving motion realism across both autoregressive and bidirectional video generators under both automatic metrics and blind human evaluation (+68 Elo gain). Ablations show that the three axes provide complementary supervision signals, while the reward preserves overall video generation quality with only modest training overhead.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.14269
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.14269 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.14269 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.14269 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers