We propose AnchorWorld, a framework that combines embodied egocentric action control with world customization. AnchorWorld enables human-motion-driven exploration and interaction within customizable, self-evolving worlds</p>\n","updatedAt":"2026-06-08T05:18:14.122Z","author":{"_id":"651ed7ef755e92f7f12742e6","avatarUrl":"/avatars/57a9cc189b4a59299aad6c96191b18d8.svg","fullname":"yu li","name":"lyabc","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.7811732888221741},"editors":["lyabc"],"editorAvatarUrls":["/avatars/57a9cc189b4a59299aad6c96191b18d8.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.07326","authors":[{"_id":"6a264d26e4c258a02949214d","name":"Yu Li","hidden":false},{"_id":"6a264d26e4c258a02949214e","name":"Menghan Xia","hidden":false},{"_id":"6a264d26e4c258a02949214f","name":"Gongye Liu","hidden":false},{"_id":"6a264d26e4c258a029492150","name":"Xintao Wang","hidden":false},{"_id":"6a264d26e4c258a029492151","name":"Conglang Zhang","hidden":false},{"_id":"6a264d26e4c258a029492152","name":"Lei Ke","hidden":false},{"_id":"6a264d26e4c258a029492153","name":"Yuxuan Lin","hidden":false},{"_id":"6a264d26e4c258a029492154","name":"Ruihang Chu","hidden":false},{"_id":"6a264d26e4c258a029492155","name":"Pengfei Wan","hidden":false},{"_id":"6a264d26e4c258a029492156","name":"Kun Gai","hidden":false},{"_id":"6a264d26e4c258a029492157","name":"Yujiu Yang","hidden":false}],"publishedAt":"2026-06-05T00:00:00.000Z","submittedOnDailyAt":"2026-06-08T00:00:00.000Z","title":"AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization","submittedOnDailyBy":{"_id":"651ed7ef755e92f7f12742e6","avatarUrl":"/avatars/57a9cc189b4a59299aad6c96191b18d8.svg","isPro":false,"fullname":"yu li","user":"lyabc","type":"user","name":"lyabc"},"summary":"Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld, a framework that advances egocentric simulation through enhanced interaction integrity and a flexible mechanism for world customization. First, we utilize 3D human motion as the primary interaction modality. To complement the out-of-view or truncated body parts in egocentric views, we introduce an auxiliary training supervision that incorporates exogenous viewpoints decoupled from the agent's first-person sensorium. It allows the model to observe the agent's full-body positioning relative to the environment, facilitating a more robust spatial grounding of human-world interactions. Furthermore, we propose a simple yet effective mechanism for customizing self-evolving worlds. This is achieved by defining anchor views within a unified world coordinate system, coupled with textual descriptions dictating the dynamic evolution of local scenes. Experimental results show that AnchorWorld significantly outperforms state-of-the-art baselines, while ablation studies validate the effectiveness of our key designs. Notably, our customization scheme exhibits promising spatio-temporal geometric consistency and adheres strictly to the prescribed evolutionary dynamics.","upvotes":21,"discussionId":"6a264d26e4c258a029492158","projectPage":"https://yuli0103.github.io/AnchorWorld/","ai_summary":"AnchorWorld advances egocentric simulation through enhanced interaction integrity and flexible world customization using 3D human motion and anchor view definitions.","ai_keywords":["egocentric simulation","3D human motion","auxiliary training supervision","exogenous viewpoints","spatial grounding","self-evolving worlds","anchor views","world coordinate system","textual descriptions","dynamic evolution"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"662c559b322afcbae51b3c8b","name":"KlingTeam","fullname":"Kling Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/60e272ca6c78a8c122b12127/ZQV1aKLUDPf2rUcxxAqj6.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"651ed7ef755e92f7f12742e6","avatarUrl":"/avatars/57a9cc189b4a59299aad6c96191b18d8.svg","isPro":false,"fullname":"yu li","user":"lyabc","type":"user"},{"_id":"642e3bcb958faf258a40e89c","avatarUrl":"/avatars/dad142df2217f8eed1f45c9e7287d3ea.svg","isPro":false,"fullname":"Ruihang Chu","user":"Ruihang","type":"user"},{"_id":"634ce90e741a5e37886a19e3","avatarUrl":"/avatars/0d1579039136b37db5b67282b0a34c33.svg","isPro":false,"fullname":"Syang","user":"Andyson","type":"user"},{"_id":"650be23ec4e52db6a4db63ef","avatarUrl":"/avatars/03af548029b38bee49ec295fefe74f9a.svg","isPro":false,"fullname":"Haoling Li","user":"Ringo1110","type":"user"},{"_id":"6553316bf151de82f6a23e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6553316bf151de82f6a23e1d/GTBkSj4Fa3OoyM6Muz_Sc.jpeg","isPro":false,"fullname":"Gongye Liu","user":"liuhuohuo","type":"user"},{"_id":"655de51982afda0fc479fb91","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655de51982afda0fc479fb91/-t9RLNEBAESO0niQGHoss.png","isPro":false,"fullname":"Tianhe Wu","user":"TianheWu","type":"user"},{"_id":"6509991bad753305dec7df10","avatarUrl":"/avatars/6b578a11672a37616f5b50b086f67cff.svg","isPro":false,"fullname":"linyuxuan","user":"misakatyan","type":"user"},{"_id":"64292eb375bcc24c5e52c011","avatarUrl":"/avatars/c8cb03ca35ca12d8831be5f4e8547d54.svg","isPro":false,"fullname":"czl","user":"Lin1557","type":"user"},{"_id":"637ee45b2438d7485b8d8f6a","avatarUrl":"/avatars/11b7d29b6fa6c1b392641e0cd4002863.svg","isPro":false,"fullname":"Xiaogang Xu","user":"xiaogang00","type":"user"},{"_id":"637f70d6fab5db9101c3dfc8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/637f70d6fab5db9101c3dfc8/NgkYNXWLDavLbrnCby2Fl.jpeg","isPro":false,"fullname":"Yujie Wei","user":"weilllllls","type":"user"},{"_id":"659fe17d58a49686b2b4aae9","avatarUrl":"/avatars/4163bb79967e92efd0a0d9af26441fb1.svg","isPro":false,"fullname":"kl","user":"kl233","type":"user"},{"_id":"662f93942510ef5735d7ad00","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662f93942510ef5735d7ad00/ZIDIPm63sncIHFTT5b0uR.png","isPro":false,"fullname":"magicwpf","user":"magicwpf","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"662c559b322afcbae51b3c8b","name":"KlingTeam","fullname":"Kling Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/60e272ca6c78a8c122b12127/ZQV1aKLUDPf2rUcxxAqj6.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.07326.md"}">
AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization
Published on Jun 5
· Submitted by yu li on Jun 8 Authors: ,
,
,
,
,
,
,
,
,
,
Abstract
AnchorWorld advances egocentric simulation through enhanced interaction integrity and flexible world customization using 3D human motion and anchor view definitions.
Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile controllability required by practical scenarios. To bridge this gap, we present AnchorWorld, a framework that advances egocentric simulation through enhanced interaction integrity and a flexible mechanism for world customization. First, we utilize 3D human motion as the primary interaction modality. To complement the out-of-view or truncated body parts in egocentric views, we introduce an auxiliary training supervision that incorporates exogenous viewpoints decoupled from the agent's first-person sensorium. It allows the model to observe the agent's full-body positioning relative to the environment, facilitating a more robust spatial grounding of human-world interactions. Furthermore, we propose a simple yet effective mechanism for customizing self-evolving worlds. This is achieved by defining anchor views within a unified world coordinate system, coupled with textual descriptions dictating the dynamic evolution of local scenes. Experimental results show that AnchorWorld significantly outperforms state-of-the-art baselines, while ablation studies validate the effectiveness of our key designs. Notably, our customization scheme exhibits promising spatio-temporal geometric consistency and adheres strictly to the prescribed evolutionary dynamics.
Community
We propose AnchorWorld, a framework that combines embodied egocentric action control with world customization. AnchorWorld enables human-motion-driven exploration and interaction within customizable, self-evolving worlds
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.07326 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.07326 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.07326 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.