<a href=\"https://cdn-uploads.huggingface.co/production/uploads/648c3fdfe6cf06a0ee3bc448/adCWwHPknQ7weCKAlEUNH.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/648c3fdfe6cf06a0ee3bc448/adCWwHPknQ7weCKAlEUNH.png\" alt=\"figure_one-1\"></a></p>\n<p><a href=\"https://cdn-uploads.huggingface.co/production/uploads/648c3fdfe6cf06a0ee3bc448/R87Y9TZ5FYoJgAslR2ONF.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/648c3fdfe6cf06a0ee3bc448/R87Y9TZ5FYoJgAslR2ONF.png\" alt=\"codebook_figure_v4-1\"></a></p>\n","updatedAt":"2026-06-16T13:37:09.110Z","author":{"_id":"648c3fdfe6cf06a0ee3bc448","avatarUrl":"/avatars/871f8a48d767d5073a0e814e2b5b5efc.svg","fullname":"Hyunjin Kim","name":"hjhyunjinkim","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.35851022601127625},"editors":["hjhyunjinkim"],"editorAvatarUrls":["/avatars/871f8a48d767d5073a0e814e2b5b5efc.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.16202","authors":[{"_id":"6a315082a0d4daae428607b1","name":"Hyunjin Kim","hidden":false},{"_id":"6a315082a0d4daae428607b2","name":"Ri-Zhao Qiu","hidden":false},{"_id":"6a315082a0d4daae428607b3","name":"Guangqi Jiang","hidden":false},{"_id":"6a315082a0d4daae428607b4","name":"Xiaolong Wang","hidden":false}],"publishedAt":"2026-06-15T00:00:00.000Z","submittedOnDailyAt":"2026-06-16T00:00:00.000Z","title":"EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video","submittedOnDailyBy":{"_id":"648c3fdfe6cf06a0ee3bc448","avatarUrl":"/avatars/871f8a48d767d5073a0e814e2b5b5efc.svg","isPro":false,"fullname":"Hyunjin Kim","user":"hjhyunjinkim","type":"user","name":"hjhyunjinkim"},"summary":"Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric videos by distilling per-object inverse-physics solutions into a compact codebook, enabling prediction of dense spring stiffness fields for unseen objects without per-spring test-time optimization. Trained with generalizable priors from diverse egocentric interactions, EgoPhys outperforms baselines in reconstruction, future prediction, and zero-shot generalization. To support training and evaluation, we curate an egocentric interaction dataset covering diverse deformable objects, scenes, and manipulation styles. We deploy EgoPhys on a real xArm6 robot, demonstrating that a digital twin initialized from a single egocentric human play video can serve as an internal world representation to aid in deformable-object planning, highlighting egocentric RGB observations as a scalable path toward real-to-sim pipelines.","upvotes":1,"discussionId":"6a315082a0d4daae428607b5","projectPage":"https://hjhyunjinkim.github.io/EgoPhys/","ai_summary":"EgoPhys enables deformable digital twin generation from egocentric RGB video by using generalizable priors and compact codebooks to predict dense spring stiffness fields without per-spring optimization.","ai_keywords":["deformable digital twins","egocentric RGB video","generalizable priors","inverse-physics solutions","codebook","dense spring stiffness fields","zero-shot generalization","xArm6 robot","real-to-sim pipelines"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"697e87d12cc19315a8497001","name":"UCSanDiego","fullname":"University of California at San Diego","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/697e8687c00f332cf492d29e/KUQpvngxP4r9oBSDZwIwZ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"648c3fdfe6cf06a0ee3bc448","avatarUrl":"/avatars/871f8a48d767d5073a0e814e2b5b5efc.svg","isPro":false,"fullname":"Hyunjin Kim","user":"hjhyunjinkim","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"697e87d12cc19315a8497001","name":"UCSanDiego","fullname":"University of California at San Diego","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/697e8687c00f332cf492d29e/KUQpvngxP4r9oBSDZwIwZ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.16202.md","query":{}}">
EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video
Abstract
EgoPhys enables deformable digital twin generation from egocentric RGB video by using generalizable priors and compact codebooks to predict dense spring stiffness fields without per-spring optimization.
Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric videos by distilling per-object inverse-physics solutions into a compact codebook, enabling prediction of dense spring stiffness fields for unseen objects without per-spring test-time optimization. Trained with generalizable priors from diverse egocentric interactions, EgoPhys outperforms baselines in reconstruction, future prediction, and zero-shot generalization. To support training and evaluation, we curate an egocentric interaction dataset covering diverse deformable objects, scenes, and manipulation styles. We deploy EgoPhys on a real xArm6 robot, demonstrating that a digital twin initialized from a single egocentric human play video can serve as an internal world representation to aid in deformable-object planning, highlighting egocentric RGB observations as a scalable path toward real-to-sim pipelines.
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.16202 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.16202 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.16202 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.