Hugging Face Daily Papers · June 22, 2026 · 4 min read

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.</p>\n","updatedAt":"2026-06-22T03:15:56.217Z","author":{"_id":"672c6f3d4c1e2de12c6f174e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mv-8GX6OBBpJLzVvyPYbz.png","fullname":"Yehang Zhang","name":"Buzz-lightyear","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8977317214012146},"editors":["Buzz-lightyear"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mv-8GX6OBBpJLzVvyPYbz.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.18847","authors":[{"_id":"6a38a8bddb23715e9da13958","name":"Yehang Zhang","hidden":false},{"_id":"6a38a8bddb23715e9da13959","name":"Jianchong Su","hidden":false},{"_id":"6a38a8bddb23715e9da1395a","name":"Haojian Huang","hidden":false},{"_id":"6a38a8bddb23715e9da1395b","name":"Yifan Chang","hidden":false},{"_id":"6a38a8bddb23715e9da1395c","name":"Tianhao Zhou","hidden":false},{"_id":"6a38a8bddb23715e9da1395d","name":"Xinli Xu","hidden":false},{"_id":"6a38a8bddb23715e9da1395e","name":"Yingjie Xu","hidden":false},{"_id":"6a38a8bddb23715e9da1395f","name":"Yinchuan Li","hidden":false},{"_id":"6a38a8bddb23715e9da13960","name":"Zexi Li","hidden":false},{"_id":"6a38a8bddb23715e9da13961","name":"Ying-Cong Chen","hidden":false}],"publishedAt":"2026-06-17T00:00:00.000Z","submittedOnDailyAt":"2026-06-22T00:00:00.000Z","title":"WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents","submittedOnDailyBy":{"_id":"672c6f3d4c1e2de12c6f174e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mv-8GX6OBBpJLzVvyPYbz.png","isPro":false,"fullname":"Yehang Zhang","user":"Buzz-lightyear","type":"user","name":"Buzz-lightyear"},"summary":"To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.","upvotes":2,"discussionId":"6a38a8bddb23715e9da13962","ai_summary":"WorldLines benchmark evaluates long-term memory in embodied agents through household scenarios, while ObsMem framework addresses challenges in partial observability and memory translation for decision-making.","ai_keywords":["long-term memory","embodied agents","household assistance","Memory QA","Embodied Task Planning","observer-grounded memory","partial observability","state-aware decisions"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"672c6f3d4c1e2de12c6f174e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/mv-8GX6OBBpJLzVvyPYbz.png","isPro":false,"fullname":"Yehang Zhang","user":"Buzz-lightyear","type":"user"},{"_id":"68b16d24abc905aa582d5339","avatarUrl":"/avatars/65941aab339d8c126adbf300c3b88c11.svg","isPro":false,"fullname":"jianchongsu","user":"chong6354","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.18847.md","query":{}}">

Papers

arxiv:2606.18847

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Published on Jun 17

· Submitted by

Yehang Zhang on Jun 22

Upvote

Authors:

Abstract

WorldLines benchmark evaluates long-term memory in embodied agents through household scenarios, while ObsMem framework addresses challenges in partial observability and memory translation for decision-making.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

View arXiv page View PDF Add to collection

Community

Buzz-lightyear

Paper submitter about 5 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.18847

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.18847 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.18847 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.18847 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers