Hugging Face Daily Papers · · 4 min read

DreamX-World 1.0: A General-Purpose Interactive World Model

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Project Page: <a href=\"https://amap-ml.github.io/DreamX_World\" rel=\"nofollow\">https://amap-ml.github.io/DreamX_World</a><br>Code: <a href=\"https://github.com/AMAP-ML/DreamX-World\" rel=\"nofollow\">https://github.com/AMAP-ML/DreamX-World</a></p>\n","updatedAt":"2026-06-16T03:09:01.402Z","author":{"_id":"64be128a2e66dc7b8bd8459d","avatarUrl":"/avatars/ac5a5246dc19dd35bbd89d7fc492cba5.svg","fullname":"Rui Chen","name":"ruichen9618","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7172680497169495},"editors":["ruichen9618"],"editorAvatarUrls":["/avatars/ac5a5246dc19dd35bbd89d7fc492cba5.svg"],"reactions":[{"reaction":"🚀","users":["Jiashuz","taesiri","xiaochonglinghu","peterlrm","jamestang0219"],"count":5}],"isReport":false}},{"id":"6a30c1173785b02c5658390c","author":{"_id":"6682775501c30ad93ec5e500","avatarUrl":"/avatars/971ee2028589f6089559306b40a58da0.svg","fullname":"Jiashu Zhu","name":"Jiashuz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false},"createdAt":"2026-06-16T03:20:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Great work! Looking forward to the code and checkpoints of DreamX-World 1.0!","html":"<p>Great work! Looking forward to the code and checkpoints of DreamX-World 1.0!</p>\n","updatedAt":"2026-06-16T03:20:55.694Z","author":{"_id":"6682775501c30ad93ec5e500","avatarUrl":"/avatars/971ee2028589f6089559306b40a58da0.svg","fullname":"Jiashu Zhu","name":"Jiashuz","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8224184513092041},"editors":["Jiashuz"],"editorAvatarUrls":["/avatars/971ee2028589f6089559306b40a58da0.svg"],"reactions":[],"isReport":false}},{"id":"6a30c331d98731290f824a12","author":{"_id":"66d255e3947594430c723ff6","avatarUrl":"/avatars/c56e4792332a01bf34085a75ee64916e.svg","fullname":"xiaochonglinghu","name":"xiaochonglinghu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false},"createdAt":"2026-06-16T03:29:53.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"A very interesting work!\nFeaturing:\n🎮 Real-time interaction\n🧠 Long-term memory\n🎨 Multi-style support\n⏱️ 1-minute continuous generation","html":"<p>A very interesting work!<br>Featuring:<br>🎮 Real-time interaction<br>🧠 Long-term memory<br>🎨 Multi-style support<br>⏱️ 1-minute continuous generation</p>\n","updatedAt":"2026-06-16T03:29:53.593Z","author":{"_id":"66d255e3947594430c723ff6","avatarUrl":"/avatars/c56e4792332a01bf34085a75ee64916e.svg","fullname":"xiaochonglinghu","name":"xiaochonglinghu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8894701600074768},"editors":["xiaochonglinghu"],"editorAvatarUrls":["/avatars/c56e4792332a01bf34085a75ee64916e.svg"],"reactions":[],"isReport":false}},{"id":"6a30fa526d4ee03d872cfe48","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1176,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}},"createdAt":"2026-06-16T07:25:06.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2026-06-16T07:25:25.126Z","author":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","fullname":"Adina Yakefu","name":"AdinaY","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":1176,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583856921041-5dd96eb166059660ed1ee413.png","fullname":"Hugging Face","name":"huggingface","type":"org","isHf":true,"details":"The AI community building the future.","plan":"team"}}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}},{"id":"6a30fd34ccb741abbf105ea4","author":{"_id":"64be128a2e66dc7b8bd8459d","avatarUrl":"/avatars/ac5a5246dc19dd35bbd89d7fc492cba5.svg","fullname":"Rui Chen","name":"ruichen9618","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false},"createdAt":"2026-06-16T07:37:24.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Model: https://huggingface.co/GD-ML/DreamX-World-5B","html":"<p>Model: <a href=\"https://huggingface.co/GD-ML/DreamX-World-5B\">https://huggingface.co/GD-ML/DreamX-World-5B</a></p>\n","updatedAt":"2026-06-16T07:37:24.779Z","author":{"_id":"64be128a2e66dc7b8bd8459d","avatarUrl":"/avatars/ac5a5246dc19dd35bbd89d7fc492cba5.svg","fullname":"Rui Chen","name":"ruichen9618","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4639875888824463},"editors":["ruichen9618"],"editorAvatarUrls":["/avatars/ac5a5246dc19dd35bbd89d7fc492cba5.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.16993","authors":[{"_id":"6a30ba8aa0d4daae4285fe66","name":"DreamX Team","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe67","name":"Yancheng Bai","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe68","user":{"_id":"64be128a2e66dc7b8bd8459d","avatarUrl":"/avatars/ac5a5246dc19dd35bbd89d7fc492cba5.svg","isPro":false,"fullname":"Rui Chen","user":"ruichen9618","type":"user","name":"ruichen9618"},"name":"Rui Chen","status":"claimed_verified","statusLastChangedAt":"2026-06-16T12:07:13.951Z","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe69","name":"Xiangxiang Chu","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe6a","name":"Rujing Dang","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe6b","name":"Hao Dou","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe6c","name":"Bingjie Gao","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe6d","name":"Qiwen Gu","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe6e","name":"Siyu Hong","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe6f","name":"Jiachen Lei","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe70","name":"Geng Li","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe71","name":"Jifan Li","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe72","name":"Ruimin Lin","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe73","name":"Qingfeng Shi","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe74","name":"Bingze Song","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe75","name":"Lei Sun","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe76","name":"Jing Tang","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe77","name":"Ruitian Tian","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe78","name":"Jun Wang","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe79","name":"Jiahong Wu","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe7a","name":"Pengfei Zhang","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe7b","name":"Shen Zhang","hidden":false},{"_id":"6a30ba8aa0d4daae4285fe7c","name":"Jiashu Zhu","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/63a369d98c0c89dcae3b8329/FdPyagljGrfzJKsnkyveX.mp4"],"publishedAt":"2026-06-15T00:00:00.000Z","submittedOnDailyAt":"2026-06-16T00:00:00.000Z","title":"DreamX-World 1.0: A General-Purpose Interactive World Model","submittedOnDailyBy":{"_id":"63a369d98c0c89dcae3b8329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a369d98c0c89dcae3b8329/AiH2zjy1cnt9OADAAZMLD.jpeg","isPro":false,"fullname":"Adina Yakefu","user":"AdinaY","type":"user","name":"AdinaY"},"summary":"DreamX-World 1.0 is a general-purpose interactive text/image-to-video world model for controllable long-horizon generation. It supports camera navigation, revisits to previously observed regions, and promptable events across photorealistic, game-style, and stylized domains. Our data engine combines camera-accurate Unreal Engine rendering, action-rich gameplay recordings, and real-world videos with recovered camera geometry. For camera control, we introduce E-PRoPE, a lightweight variant of projective positional encoding that retains PRoPE's projective camera geometry while applying camera-aware attention to spatially reduced tokens. We convert a bidirectional video generator into a few-step autoregressive world model using causal forcing, DMD-style distillation, and long-rollout training. Training on self-generated long-horizon contexts exposes the model to its own generated history and reduces the style and color drift that accumulates across autoregressive chunks. Memory-Conditioned Scene Persistence retrieves earlier views through camera-geometry-based retrieval, while residual recycling makes the conditioning path less sensitive to imperfect memory latents. Event Instruction Tuning adds composable event control, and reinforcement learning alignment recovers camera control and visual quality after distillation. With mixed-precision DiT execution, residual reuse, 75\\%-pruned VAE decoding, and asynchronous pipeline parallelism, DreamX-World 1.0 reaches up to 16\\,FPS on eight RTX\\,5090 GPUs. On our 5-second basic evaluation, DreamX-World 1.0 achieves a camera-control score of 73.75 and an overall score of 84.76, outperforming HY-WorldPlay 1.5 and LingBot-World in overall score, which achieve 80.79 and 80.45, respectively.","upvotes":61,"discussionId":"6a30ba8aa0d4daae4285fe7d","projectPage":"https://amap-ml.github.io/DreamX_World/","githubRepo":"https://github.com/AMAP-ML/DreamX-World","githubRepoAddedBy":"user","ai_summary":"DreamX-World 1.0 is a interactive text/image-to-video model that generates long-horizon content with camera control and scene persistence using specialized encoding, training techniques, and optimization methods.","ai_keywords":["E-PRoPE","projective positional encoding","PRoPE","camera-aware attention","bidirectional video generator","autoregressive world model","causal forcing","DMD-style distillation","long-rollout training","memory-Conditioned Scene Persistence","residual recycling","Event Instruction Tuning","reinforcement learning alignment","mixed-precision DiT execution","asynchronous pipeline parallelism"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":264,"organization":{"_id":"67d11771890254196d3174e5","name":"GD-ML","fullname":"AMAP-ML","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67d116c47be76de1a40873ca/s5ukAx9E36ZZIKvbpBRi4.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66d255e3947594430c723ff6","avatarUrl":"/avatars/c56e4792332a01bf34085a75ee64916e.svg","isPro":false,"fullname":"xiaochonglinghu","user":"xiaochonglinghu","type":"user"},{"_id":"67d116c47be76de1a40873ca","avatarUrl":"/avatars/d33a689b4a95709e50458a7163e0691d.svg","isPro":false,"fullname":"AMAP-ML","user":"AMAP-ML","type":"user"},{"_id":"6682775501c30ad93ec5e500","avatarUrl":"/avatars/971ee2028589f6089559306b40a58da0.svg","isPro":false,"fullname":"Jiashu Zhu","user":"Jiashuz","type":"user"},{"_id":"648c6537aeff9347218f49f2","avatarUrl":"/avatars/1891855926eec77f91a389755998212f.svg","isPro":false,"fullname":"Jiachen Lei","user":"jiachenlei","type":"user"},{"_id":"650a951711afda55caf5beb4","avatarUrl":"/avatars/3937840e6852088936a0fd2f12eb68b5.svg","isPro":false,"fullname":"tang","user":"jamestang0219","type":"user"},{"_id":"6964a38ae811a5a53de8ebc9","avatarUrl":"/avatars/d2d3ed0ce3e25a158c6863a15ebeb25b.svg","isPro":false,"fullname":"So","user":"dwdSo","type":"user"},{"_id":"650758da9622235d7dcba97e","avatarUrl":"/avatars/258802da8dfe3182e7f57288d6249f09.svg","isPro":false,"fullname":"Jianhao Zeng","user":"JianhaoZeng","type":"user"},{"_id":"66f3800367e707182cfabc92","avatarUrl":"/avatars/005c9ab9d7c78761c282c85a1258456d.svg","isPro":false,"fullname":"Haopeng Zhang","user":"espen123","type":"user"},{"_id":"645e0282412d9e9cd044c764","avatarUrl":"/avatars/1eebe4e89e151611ea96c38483549bd0.svg","isPro":false,"fullname":"gao","user":"bingjie","type":"user"},{"_id":"666a8d005cd959972dadbd3d","avatarUrl":"/avatars/0c3bac11b36ca548dc6068072fca4b1f.svg","isPro":false,"fullname":"Jun Wang","user":"wang21jun","type":"user"},{"_id":"64d1dc5273174cecdffc97d3","avatarUrl":"/avatars/6564e6b68fee9673f75b6366adf39a3b.svg","isPro":false,"fullname":"Wang Yong","user":"seashell11","type":"user"},{"_id":"689d4a717fb1c6267bb59acc","avatarUrl":"/avatars/13b5117a0108e40cbffa8114c112cfad.svg","isPro":false,"fullname":"peter","user":"peterlrm","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67d11771890254196d3174e5","name":"GD-ML","fullname":"AMAP-ML","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67d116c47be76de1a40873ca/s5ukAx9E36ZZIKvbpBRi4.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.16993.md","query":{}}">
Papers
arxiv:2606.16993

DreamX-World 1.0: A General-Purpose Interactive World Model

Published on Jun 15
· Submitted by
Adina Yakefu
on Jun 16
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

DreamX-World 1.0 is a interactive text/image-to-video model that generates long-horizon content with camera control and scene persistence using specialized encoding, training techniques, and optimization methods.

DreamX-World 1.0 is a general-purpose interactive text/image-to-video world model for controllable long-horizon generation. It supports camera navigation, revisits to previously observed regions, and promptable events across photorealistic, game-style, and stylized domains. Our data engine combines camera-accurate Unreal Engine rendering, action-rich gameplay recordings, and real-world videos with recovered camera geometry. For camera control, we introduce E-PRoPE, a lightweight variant of projective positional encoding that retains PRoPE's projective camera geometry while applying camera-aware attention to spatially reduced tokens. We convert a bidirectional video generator into a few-step autoregressive world model using causal forcing, DMD-style distillation, and long-rollout training. Training on self-generated long-horizon contexts exposes the model to its own generated history and reduces the style and color drift that accumulates across autoregressive chunks. Memory-Conditioned Scene Persistence retrieves earlier views through camera-geometry-based retrieval, while residual recycling makes the conditioning path less sensitive to imperfect memory latents. Event Instruction Tuning adds composable event control, and reinforcement learning alignment recovers camera control and visual quality after distillation. With mixed-precision DiT execution, residual reuse, 75\%-pruned VAE decoding, and asynchronous pipeline parallelism, DreamX-World 1.0 reaches up to 16\,FPS on eight RTX\,5090 GPUs. On our 5-second basic evaluation, DreamX-World 1.0 achieves a camera-control score of 73.75 and an overall score of 84.76, outperforming HY-WorldPlay 1.5 and LingBot-World in overall score, which achieve 80.79 and 80.45, respectively.

Community

Great work! Looking forward to the code and checkpoints of DreamX-World 1.0!

A very interesting work!
Featuring:
🎮 Real-time interaction
🧠 Long-term memory
🎨 Multi-style support
⏱️ 1-minute continuous generation

Paper submitter about 6 hours ago
This comment has been hidden
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.16993
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.16993 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.16993 in a Space README.md to link it from this page.

Collections including this paper 2

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers