Hugging Face Daily Papers · · 4 min read

World Model for Robot Learning: A Comprehensive Survey

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

A policy-centric survey of predictive world models for robot policy learning, planning, simulation, evaluation, data generation, and robotic video generation.</p>\n","updatedAt":"2026-05-13T03:39:14.928Z","author":{"_id":"609115c79a8bcaa437b234a9","avatarUrl":"/avatars/1631a91030703d8397133363cf82c863.svg","fullname":"Leng Sicong","name":"Sicong","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":21,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7798899412155151},"editors":["Sicong"],"editorAvatarUrls":["/avatars/1631a91030703d8397133363cf82c863.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.00080","authors":[{"_id":"6a03f22186b054ce2fa40eb3","name":"Bohan Hou","hidden":false},{"_id":"6a03f22186b054ce2fa40eb4","name":"Gen Li","hidden":false},{"_id":"6a03f22186b054ce2fa40eb5","name":"Jindou Jia","hidden":false},{"_id":"6a03f22186b054ce2fa40eb6","name":"Tuo An","hidden":false},{"_id":"6a03f22186b054ce2fa40eb7","name":"Xinying Guo","hidden":false},{"_id":"6a03f22186b054ce2fa40eb8","name":"Sicong Leng","hidden":false},{"_id":"6a03f22186b054ce2fa40eb9","name":"Haoran Geng","hidden":false},{"_id":"6a03f22186b054ce2fa40eba","name":"Yanjie Ze","hidden":false},{"_id":"6a03f22186b054ce2fa40ebb","name":"Tatsuya Harada","hidden":false},{"_id":"6a03f22186b054ce2fa40ebc","name":"Philip Torr","hidden":false},{"_id":"6a03f22186b054ce2fa40ebd","name":"Oier Mees","hidden":false},{"_id":"6a03f22186b054ce2fa40ebe","name":"Marc Pollefeys","hidden":false},{"_id":"6a03f22186b054ce2fa40ebf","name":"Zhuang Liu","hidden":false},{"_id":"6a03f22186b054ce2fa40ec0","name":"Jiajun Wu","hidden":false},{"_id":"6a03f22186b054ce2fa40ec1","name":"Pieter Abbeel","hidden":false},{"_id":"6a03f22186b054ce2fa40ec2","name":"Jitendra Malik","hidden":false},{"_id":"6a03f22186b054ce2fa40ec3","name":"Yilun Du","hidden":false},{"_id":"6a03f22186b054ce2fa40ec4","name":"Jianfei Yang","hidden":false}],"publishedAt":"2026-04-30T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"World Model for Robot Learning: A Comprehensive Survey","submittedOnDailyBy":{"_id":"609115c79a8bcaa437b234a9","avatarUrl":"/avatars/1631a91030703d8397133363cf82c863.svg","isPro":false,"fullname":"Leng Sicong","user":"Sicong","type":"user","name":"Sicong"},"summary":"World models, which are predictive representations of how environments evolve under actions, have become a central component of robot learning. They support policy learning, planning, simulation, evaluation, data generation, and have advanced rapidly with the rise of foundation models and large-scale video generation. However, the literature remains fragmented across architectures, functional roles, and embodied application domains. To address this gap, we present a comprehensive review of world models from a robot-learning perspective. We examine how world models are coupled with robot policies, how they serve as learned simulators for reinforcement learning and evaluation, and how robotic video world models have progressed from imagination-based generation to controllable, structured, and foundation-scale formulations. We further connect these ideas to navigation and autonomous driving, and summarize representative datasets, benchmarks, and evaluation protocols. Overall, this survey systematically reviews the rapidly growing literature on world models for robot learning, clarifies key paradigms and applications, and highlights major challenges and future directions for predictive modeling in embodied agents. To facilitate continued access to newly emerging works, benchmarks, and resources, we will maintain and regularly update the accompanying GitHub repository alongside this survey.","upvotes":11,"discussionId":"6a03f22186b054ce2fa40ec5","projectPage":"https://ntumars.github.io/wm-robot-survey/","githubRepo":"https://github.com/NTUMARS/Awesome-World-Model-for-Robotics-Policy","githubRepoAddedBy":"user","ai_summary":"World models as predictive representations of environmental dynamics have become essential for robot learning, supporting policy learning, planning, and simulation across various embodied applications.","ai_keywords":["world models","predictive representations","robot learning","reinforcement learning","embodied agents","video generation","simulation","policy learning","planning","evaluation"],"githubStars":312},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"609115c79a8bcaa437b234a9","avatarUrl":"/avatars/1631a91030703d8397133363cf82c863.svg","isPro":false,"fullname":"Leng Sicong","user":"Sicong","type":"user"},{"_id":"66974212a9e7257fc37798dc","avatarUrl":"/avatars/4063ac7e4a39f1a761374136983b7305.svg","isPro":false,"fullname":"Bohan Hou","user":"hbh123","type":"user"},{"_id":"651be2420e6b7fa42935b0fa","avatarUrl":"/avatars/7f822e4f09e21fc7c0f5eaf6b39219ee.svg","isPro":false,"fullname":"Jianfei Yang","user":"marsrocky","type":"user"},{"_id":"639c61c4199f20ec908e7088","avatarUrl":"/avatars/51eff8460b3bb27f725564fc4c51a96a.svg","isPro":false,"fullname":"Gen Li","user":"Gen1113","type":"user"},{"_id":"6520898f7bf8cc2dd28b7a9c","avatarUrl":"/avatars/87a29ba95b71ee2dce18e97aa85e17a1.svg","isPro":false,"fullname":"Arnas Uselis","user":"Gigglingface","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6407e5294edf9f5c4fd32228","avatarUrl":"/avatars/8e2d55460e9fe9c426eb552baf4b2cb0.svg","isPro":false,"fullname":"Stoney Kang","user":"sikang99","type":"user"},{"_id":"65d9be67be18bfea69c63830","avatarUrl":"/avatars/fe68775d214b76f8812db0d066d5be63.svg","isPro":false,"fullname":"Jialong Sun","user":"Pillow-1","type":"user"},{"_id":"646350107e9025b09bd62bab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646350107e9025b09bd62bab/Oou_8-WG72ZbkatdQ1-q6.jpeg","isPro":false,"fullname":"momo","user":"wzc991222","type":"user"},{"_id":"6a04625e883427d8f466718c","avatarUrl":"/avatars/7406ea082478b6101f0b71475ae0c24a.svg","isPro":false,"fullname":"RhondaGay","user":"RhondaGay","type":"user"},{"_id":"6111ad63fc4ee24fa160f76b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6111ad63fc4ee24fa160f76b/eZDK39yLeYVM7obmUl0fO.png","isPro":false,"fullname":"Simon DL","user":"SimonDL","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.00080.md"}">
Papers
arxiv:2605.00080

World Model for Robot Learning: A Comprehensive Survey

Published on Apr 30
· Submitted by
Leng Sicong
on May 13
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

World models as predictive representations of environmental dynamics have become essential for robot learning, supporting policy learning, planning, and simulation across various embodied applications.

AI-generated summary

World models, which are predictive representations of how environments evolve under actions, have become a central component of robot learning. They support policy learning, planning, simulation, evaluation, data generation, and have advanced rapidly with the rise of foundation models and large-scale video generation. However, the literature remains fragmented across architectures, functional roles, and embodied application domains. To address this gap, we present a comprehensive review of world models from a robot-learning perspective. We examine how world models are coupled with robot policies, how they serve as learned simulators for reinforcement learning and evaluation, and how robotic video world models have progressed from imagination-based generation to controllable, structured, and foundation-scale formulations. We further connect these ideas to navigation and autonomous driving, and summarize representative datasets, benchmarks, and evaluation protocols. Overall, this survey systematically reviews the rapidly growing literature on world models for robot learning, clarifies key paradigms and applications, and highlights major challenges and future directions for predictive modeling in embodied agents. To facilitate continued access to newly emerging works, benchmarks, and resources, we will maintain and regularly update the accompanying GitHub repository alongside this survey.

Community

Paper submitter about 17 hours ago

A policy-centric survey of predictive world models for robot policy learning, planning, simulation, evaluation, data generation, and robotic video generation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.00080
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.00080 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.00080 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.00080 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers