Hugging Face Daily Papers · · 5 min read

Foresight: Failure Detection for Long-Horizon Robotic Manipulation with Action-Conditioned World Model Latents

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Long-horizon tasks are common in real-world robotic deployments, yet failure detection for such tasks remains underexplored. Detecting failures in long-horizon robotic tasks is particularly challenging because failure onset is often ambiguous and dense temporal annotations are typically unavailable. We present Foresight, a failure detection framework that monitors manipulation trajectories using latent representations from an action-conditioned world model. Foresight is trained using only final task-level success or failure labels. By leveraging predictive world-model embeddings, our method provides a unified framework for failure detection across different policies. We further use functional conformal prediction (FCP) to calibrate detection thresholds adaptively. We evaluate Foresight with state-of-the-art vision-language-action policies in simulation on LIBERO-Long, ManiSkill-Long, and BEHAVIOR-1K, compare it against state-of-the-art failure detection methods, and validate it on real robots with three long-horizon tasks on ReactorX and one task on Franka arm. Our results suggest that action-conditioned world-model embeddings provide a scalable representation for reliable failure monitoring in long-horizon manipulation.</p>\n","updatedAt":"2026-06-23T06:58:35.821Z","author":{"_id":"649a3b7823dc99ea5484b410","avatarUrl":"/avatars/348cc082ba24824192846433af657812.svg","fullname":"Haoran Zhang","name":"oldTOM","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8815732598304749},"editors":["oldTOM"],"editorAvatarUrls":["/avatars/348cc082ba24824192846433af657812.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.23085","authors":[{"_id":"6a3a2b70fdcd3514343bb6e9","name":"Haoran Zhang","hidden":false},{"_id":"6a3a2b70fdcd3514343bb6ea","name":"Yifu Lu","hidden":false},{"_id":"6a3a2b70fdcd3514343bb6eb","name":"Boyang Wang","hidden":false},{"_id":"6a3a2b70fdcd3514343bb6ec","name":"Xuhui Kang","hidden":false},{"_id":"6a3a2b70fdcd3514343bb6ed","name":"Yen-Ling Kuo","hidden":false},{"_id":"6a3a2b70fdcd3514343bb6ee","name":"Zezhou Cheng","hidden":false},{"_id":"6a3a2b70fdcd3514343bb6ef","name":"Mengdi Wang","hidden":false},{"_id":"6a3a2b70fdcd3514343bb6f0","name":"Odest Chadwicke Jenkins","hidden":false}],"publishedAt":"2026-06-22T00:00:00.000Z","submittedOnDailyAt":"2026-06-23T00:00:00.000Z","title":"Foresight: Failure Detection for Long-Horizon Robotic Manipulation with Action-Conditioned World Model Latents","submittedOnDailyBy":{"_id":"649a3b7823dc99ea5484b410","avatarUrl":"/avatars/348cc082ba24824192846433af657812.svg","isPro":true,"fullname":"Haoran Zhang","user":"oldTOM","type":"user","name":"oldTOM"},"summary":"Long-horizon tasks are common in real-world robotic deployments, yet failure detection for such tasks remains underexplored. Detecting failures in long-horizon robotic tasks is particularly challenging because failure onset is often ambiguous and dense temporal annotations are typically unavailable. We present Foresight, a failure detection framework that monitors manipulation trajectories using latent representations from an action-conditioned world model. Foresight is trained using only final task-level success or failure labels. By leveraging predictive world-model embeddings, our method provides a unified framework for failure detection across different policies. We further use functional conformal prediction (FCP) to calibrate detection thresholds adaptively. We evaluate Foresight with state-of-the-art vision-language-action policies in simulation on LIBERO-Long, ManiSkill-Long, and BEHAVIOR-1K, compare it against state-of-the-artfailure detection methods, and validate it on real robots with three long-horizon tasks on a ReactorX-200 arm and one task on a Franka arm. Our results suggest that action-conditioned world-model embeddings provide a scalable representation for reliable failure monitoring in long-horizon manipulation.","upvotes":12,"discussionId":"6a3a2b70fdcd3514343bb6f1","projectPage":"https://haoranzhangumich.github.io/Forsight_web/","ai_summary":"A failure detection framework for long-horizon robotic tasks uses action-conditioned world models and functional conformal prediction to monitor manipulation trajectories with only final task labels.","ai_keywords":["action-conditioned world model","functional conformal prediction","failure detection","manipulation trajectories","latent representations","vision-language-action policies","long-horizon tasks","predictive world-model embeddings"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"63df4874e742e86dc925d67c","name":"umich","fullname":"University of Michigan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1675577443573-63df328115266dd945fc01f4.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"649a3b7823dc99ea5484b410","avatarUrl":"/avatars/348cc082ba24824192846433af657812.svg","isPro":true,"fullname":"Haoran Zhang","user":"oldTOM","type":"user"},{"_id":"6640b5976fab5cd63cc240c8","avatarUrl":"/avatars/b31d74dc5b8f0a023e85d8ea691b3759.svg","isPro":false,"fullname":"Linden713","user":"linden713","type":"user"},{"_id":"6555a155a6554059711b62cb","avatarUrl":"/avatars/b82868bb23fb18f2212a6e457acbe3b3.svg","isPro":false,"fullname":"Hanzhe Guo","user":"hanzheg","type":"user"},{"_id":"65259c67fa9f6e56948fe2f8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/GtZ8IvdVrsu96Gy5AKMJE.jpeg","isPro":false,"fullname":"Zesen Zhao","user":"SourORZ","type":"user"},{"_id":"64ed876a74d9b58eabc769a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64ed876a74d9b58eabc769a4/K4bJVW0FlqRtAAxJBJifR.jpeg","isPro":true,"fullname":"Boyang Wang","user":"HikariDawn","type":"user"},{"_id":"66457e98c037c605f75d40c8","avatarUrl":"/avatars/925ed68e2418fd9cc7cb853a5b4aa820.svg","isPro":false,"fullname":"Mia Zhou","user":"Mia95","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"685914a91ed0c34bbb0ad176","avatarUrl":"/avatars/749b5ad4ffcede362f59013cae9747b9.svg","isPro":false,"fullname":"Yuxin Liu","user":"yuxin-liu","type":"user"},{"_id":"64ae13026e9a4384cf9fec47","avatarUrl":"/avatars/ac8507170429f94d31318f380edd065b.svg","isPro":false,"fullname":"junfeng yang","user":"yjunfeng","type":"user"},{"_id":"694b95a87d4c24219e987057","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ulqwvJwgD2PU4m19rO_yz.png","isPro":false,"fullname":"Haiming Li","user":"pow12138","type":"user"},{"_id":"64834da6c8b6f4a798f9f09d","avatarUrl":"/avatars/5f9dd5ce3acd27434eb511660bb7f430.svg","isPro":false,"fullname":"Cheng","user":"Zezhou","type":"user"},{"_id":"63ad3de96ee60ca58a409280","avatarUrl":"/avatars/7461f4fda3692f042e556d2a7c339bc0.svg","isPro":false,"fullname":"Qi Liu","user":"QiLiuHKU","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63df4874e742e86dc925d67c","name":"umich","fullname":"University of Michigan","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1675577443573-63df328115266dd945fc01f4.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.23085.md","query":{}}">
Papers
arxiv:2606.23085

Foresight: Failure Detection for Long-Horizon Robotic Manipulation with Action-Conditioned World Model Latents

Published on Jun 22
· Submitted by
Haoran Zhang
on Jun 23
Authors:
,
,
,
,
,
,
,

Abstract

A failure detection framework for long-horizon robotic tasks uses action-conditioned world models and functional conformal prediction to monitor manipulation trajectories with only final task labels.

Long-horizon tasks are common in real-world robotic deployments, yet failure detection for such tasks remains underexplored. Detecting failures in long-horizon robotic tasks is particularly challenging because failure onset is often ambiguous and dense temporal annotations are typically unavailable. We present Foresight, a failure detection framework that monitors manipulation trajectories using latent representations from an action-conditioned world model. Foresight is trained using only final task-level success or failure labels. By leveraging predictive world-model embeddings, our method provides a unified framework for failure detection across different policies. We further use functional conformal prediction (FCP) to calibrate detection thresholds adaptively. We evaluate Foresight with state-of-the-art vision-language-action policies in simulation on LIBERO-Long, ManiSkill-Long, and BEHAVIOR-1K, compare it against state-of-the-artfailure detection methods, and validate it on real robots with three long-horizon tasks on a ReactorX-200 arm and one task on a Franka arm. Our results suggest that action-conditioned world-model embeddings provide a scalable representation for reliable failure monitoring in long-horizon manipulation.

Community

Paper submitter about 18 hours ago

Long-horizon tasks are common in real-world robotic deployments, yet failure detection for such tasks remains underexplored. Detecting failures in long-horizon robotic tasks is particularly challenging because failure onset is often ambiguous and dense temporal annotations are typically unavailable. We present Foresight, a failure detection framework that monitors manipulation trajectories using latent representations from an action-conditioned world model. Foresight is trained using only final task-level success or failure labels. By leveraging predictive world-model embeddings, our method provides a unified framework for failure detection across different policies. We further use functional conformal prediction (FCP) to calibrate detection thresholds adaptively. We evaluate Foresight with state-of-the-art vision-language-action policies in simulation on LIBERO-Long, ManiSkill-Long, and BEHAVIOR-1K, compare it against state-of-the-art failure detection methods, and validate it on real robots with three long-horizon tasks on ReactorX and one task on Franka arm. Our results suggest that action-conditioned world-model embeddings provide a scalable representation for reliable failure monitoring in long-horizon manipulation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.23085
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.23085 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.23085 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.23085 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers