Hugging Face Daily Papers · June 2, 2026 · 4 min read

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

StressDream steers video world model imaginations toward high-impact, plausible outcomes using VLM-guided noise optimization to enable robust policy evaluation for robotics and autonomous driving.</p>\n","updatedAt":"2026-06-02T04:09:49.386Z","author":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","fullname":"taesiri","name":"taesiri","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":309,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7786945700645447},"editors":["taesiri"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.00267","authors":[{"_id":"6a1e5780808ddbc3c7d43da9","name":"Junwon Seo","hidden":false},{"_id":"6a1e5780808ddbc3c7d43daa","name":"Sushant Veer","hidden":false},{"_id":"6a1e5780808ddbc3c7d43dab","name":"Ran Tian","hidden":false},{"_id":"6a1e5780808ddbc3c7d43dac","name":"Wenhao Ding","hidden":false},{"_id":"6a1e5780808ddbc3c7d43dad","name":"Apoorva Sharma","hidden":false},{"_id":"6a1e5780808ddbc3c7d43dae","name":"Karen Leung","hidden":false},{"_id":"6a1e5780808ddbc3c7d43daf","name":"Edward Schmerling","hidden":false},{"_id":"6a1e5780808ddbc3c7d43db0","name":"Marco Pavone","hidden":false},{"_id":"6a1e5780808ddbc3c7d43db1","name":"Andrea Bajcsy","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6039478ab3ecf716b1a5fd4d/am_hMGY-cUssgbHjZimZ6.mp4"],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"Video world models (WMs) have shown promise for policy evaluation and improvement by imagining realistic future observations conditioned on ego-robot actions. While WMs can model distributions over futures, policy evaluation and improvement typically rely on nominal imaginations, which can miss high-impact outcomes of robot actions unless prohibitively many samples are drawn. To enable robust policy evaluation and improvement over WM imaginations, we propose StressDream, which steers imaginations toward high-impact yet plausible outcomes specified at inference time by optimizing the initial noise of diffusion-based WMs. However, optimizing high-dimensional noise is challenging: the optimization must reason about nuanced, scene-dependent target events in generated videos while avoiding out-of-distribution (OOD) noise that yields implausible imaginations. We address this with two complementary objectives: a semantic objective with a Vision-Language Model that provides informative gradients by reasoning about the generated video, and a plausibility objective that prevents the optimized noise from drifting OOD. With state-of-the-art video world models for autonomous driving and robotic manipulation, we show that StressDream effectively steers imaginations toward high-impact yet plausible outcomes specified by text at inference time, such as task failures, enabling robust policy evaluation and improvement by identifying actions whose plausible futures include undesirable outcomes. Video results are available at https://junwon.me/StressDream/.","upvotes":1,"discussionId":"6a1e5781808ddbc3c7d43db2","projectPage":"https://junwon.me/StressDream/","githubRepo":"https://github.com/CMU-IntentLab/StressDream","githubRepoAddedBy":"user","ai_summary":"StressDream enhances video world models by steering diffusion-based imaginations toward high-impact yet plausible outcomes through optimized noise initialization with semantic and plausibility objectives.","ai_keywords":["video world models","diffusion-based models","policy evaluation","policy improvement","stress testing","initial noise optimization","Vision-Language Model","out-of-distribution","semantic objective","plausibility objective"],"githubStars":4},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"698f945cc678133bf91fa953","avatarUrl":"/avatars/760dbac37b7018271e7db45246473b47.svg","isPro":false,"fullname":"Dkltpqu43f","user":"dkltpqu43f","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0}">

Papers

arxiv:2606.00267

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

Published on May 29

· Submitted by

taesiri on Jun 2

Upvote

Authors:

Abstract

StressDream enhances video world models by steering diffusion-based imaginations toward high-impact yet plausible outcomes through optimized noise initialization with semantic and plausibility objectives.

AI-generated summary

Video world models (WMs) have shown promise for policy evaluation and improvement by imagining realistic future observations conditioned on ego-robot actions. While WMs can model distributions over futures, policy evaluation and improvement typically rely on nominal imaginations, which can miss high-impact outcomes of robot actions unless prohibitively many samples are drawn. To enable robust policy evaluation and improvement over WM imaginations, we propose StressDream, which steers imaginations toward high-impact yet plausible outcomes specified at inference time by optimizing the initial noise of diffusion-based WMs. However, optimizing high-dimensional noise is challenging: the optimization must reason about nuanced, scene-dependent target events in generated videos while avoiding out-of-distribution (OOD) noise that yields implausible imaginations. We address this with two complementary objectives: a semantic objective with a Vision-Language Model that provides informative gradients by reasoning about the generated video, and a plausibility objective that prevents the optimized noise from drifting OOD. With state-of-the-art video world models for autonomous driving and robotic manipulation, we show that StressDream effectively steers imaginations toward high-impact yet plausible outcomes specified by text at inference time, such as task failures, enabling robust policy evaluation and improvement by identifying actions whose plausible futures include undesirable outcomes. Video results are available at https://junwon.me/StressDream/.

View arXiv page View PDF Project page GitHub 4 Add to collection

Community

taesiri

Paper submitter about 6 hours ago

StressDream steers video world model imaginations toward high-impact, plausible outcomes using VLM-guided noise optimization to enable robust policy evaluation for robotics and autonomous driving.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.00267 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.00267 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.00267 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers