To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.</p>\n","updatedAt":"2026-06-22T03:15:56.217Z","author":{"_id":"672c6f3d4c1e2de12c6f174e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mv-8GX6OBBpJLzVvyPYbz.png","fullname":"Yehang Zhang","name":"Buzz-lightyear","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8977317214012146},"editors":["Buzz-lightyear"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mv-8GX6OBBpJLzVvyPYbz.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.18847","authors":[{"_id":"6a38a8bddb23715e9da13958","name":"Yehang Zhang","hidden":false},{"_id":"6a38a8bddb23715e9da13959","name":"Jianchong Su","hidden":false},{"_id":"6a38a8bddb23715e9da1395a","name":"Haojian Huang","hidden":false},{"_id":"6a38a8bddb23715e9da1395b","name":"Yifan Chang","hidden":false},{"_id":"6a38a8bddb23715e9da1395c","name":"Tianhao Zhou","hidden":false},{"_id":"6a38a8bddb23715e9da1395d","name":"Xinli Xu","hidden":false},{"_id":"6a38a8bddb23715e9da1395e","name":"Yingjie Xu","hidden":false},{"_id":"6a38a8bddb23715e9da1395f","name":"Yinchuan Li","hidden":false},{"_id":"6a38a8bddb23715e9da13960","name":"Zexi Li","hidden":false},{"_id":"6a38a8bddb23715e9da13961","name":"Ying-Cong Chen","hidden":false}],"publishedAt":"2026-06-17T00:00:00.000Z","submittedOnDailyAt":"2026-06-22T00:00:00.000Z","title":"WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents","submittedOnDailyBy":{"_id":"672c6f3d4c1e2de12c6f174e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mv-8GX6OBBpJLzVvyPYbz.png","isPro":false,"fullname":"Yehang Zhang","user":"Buzz-lightyear","type":"user","name":"Buzz-lightyear"},"summary":"To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.","upvotes":2,"discussionId":"6a38a8bddb23715e9da13962","ai_summary":"WorldLines benchmark evaluates long-term memory in embodied agents through household scenarios, while ObsMem framework addresses challenges in partial observability and memory translation for decision-making.","ai_keywords":["long-term memory","embodied agents","household assistance","Memory QA","Embodied Task Planning","observer-grounded memory","partial observability","state-aware decisions"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"672c6f3d4c1e2de12c6f174e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mv-8GX6OBBpJLzVvyPYbz.png","isPro":false,"fullname":"Yehang Zhang","user":"Buzz-lightyear","type":"user"},{"_id":"68b16d24abc905aa582d5339","avatarUrl":"/avatars/65941aab339d8c126adbf300c3b88c11.svg","isPro":false,"fullname":"jianchongsu","user":"chong6354","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.18847.md","query":{}}">
WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents
Authors: ,
,
,
,
,
,
,
,
,
Abstract
WorldLines benchmark evaluates long-term memory in embodied agents through household scenarios, while ObsMem framework addresses challenges in partial observability and memory translation for decision-making.
To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.
Community
To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.18847 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.18847 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.18847 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.