🌐 <strong>Interactive paper: <a href=\"https://www.nicklashansen.com/mmbench2\" rel=\"nofollow\">https://www.nicklashansen.com/mmbench2</a></strong></p>\n<p>🕹️ <strong>Live demo: <a href=\"https://www.nicklashansen.com/mmbench2/#live-demo\" rel=\"nofollow\">https://www.nicklashansen.com/mmbench2/#live-demo</a></strong></p>\n<p>📄 <strong>Paper: <a href=\"https://arxiv.org/abs/2606.27326\" rel=\"nofollow\">https://arxiv.org/abs/2606.27326</a></strong></p>\n<p>💻 <strong>Code: <a href=\"https://github.com/nicklashansen/mmbench2\" rel=\"nofollow\">https://github.com/nicklashansen/mmbench2</a></strong></p>\n<p>📦 <strong>Dataset: <a href=\"https://huggingface.co/datasets/nicklashansen/mmbench2\">https://huggingface.co/datasets/nicklashansen/mmbench2</a></strong></p>\n<p>🤖 <strong>Models: <a href=\"https://huggingface.co/nicklashansen/mmbench2-models\">https://huggingface.co/nicklashansen/mmbench2-models</a></strong></p>\n","updatedAt":"2026-06-26T05:43:09.725Z","author":{"_id":"62d8ab94bf50eca8389bbc01","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62d8ab94bf50eca8389bbc01/sXJupPzRO5ZV-BiQRdFeV.png","fullname":"Nicklas Hansen","name":"nicklashansen","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":15,"isUserFollowing":false}},"numEdits":2,"identifiedLanguage":{"language":"en","probability":0.8626660108566284},"editors":["nicklashansen"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/62d8ab94bf50eca8389bbc01/sXJupPzRO5ZV-BiQRdFeV.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.27326","authors":[{"_id":"6a3e02443b43e283349ec261","name":"Nicklas Hansen","hidden":false},{"_id":"6a3e02443b43e283349ec262","name":"Xiaolong Wang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/62d8ab94bf50eca8389bbc01/1Fukbt0Urs5xC_Z-ruCyg.mp4","https://cdn-uploads.huggingface.co/production/uploads/62d8ab94bf50eca8389bbc01/Qi1x36PX6ta4jZ2SxzAgD.gif"],"publishedAt":"2026-06-25T00:00:00.000Z","submittedOnDailyAt":"2026-06-26T00:00:00.000Z","title":"Hallucination in World Models is Predictable and Preventable","submittedOnDailyBy":{"_id":"62d8ab94bf50eca8389bbc01","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62d8ab94bf50eca8389bbc01/sXJupPzRO5ZV-BiQRdFeV.png","isPro":true,"fullname":"Nicklas Hansen","user":"nicklashansen","type":"user","name":"nicklashansen"},"summary":"Modern generative world models render increasingly realistic action-controllable futures, yet they frequently hallucinate: rollouts remain visually fluent while drifting from the ground-truth dynamics. We hypothesize that hallucination concentrates in low-coverage regions of the state-action space, where lightweight data-centric signals can both detect it and guide mitigation. To test this, we introduce MMBench2, a 427-hour, 210-task dataset for visual world modeling with ground-truth actions, rewards, and live simulators, and train a 350M-parameter world model on it. We identify three distinct hallucination modes: perceptual, action-marginalized, and scene-diverging -- each anchored to a different stage of the pipeline, and develop three signals that accurately predict where the model will fail. To close coverage gaps at training time, we develop a coverage-aware sampling technique; to close them online, our hallucination predictors serve as curiosity rewards for targeted data collection, yielding a data-efficient finetuning recipe that adapts the pretrained world model to entirely unseen environments with as few as 50 real environment trajectories. Overall, our findings reveal that hallucination in world models is inherently a data coverage issue, and that the same signals used to detect it can also be used for mitigation.\n An interactive web version of our paper is available at https://www.nicklashansen.com/mmbench2","upvotes":8,"discussionId":"6a3e02443b43e283349ec263","projectPage":"https://www.nicklashansen.com/mmbench2","githubRepo":"https://github.com/nicklashansen/mmbench2","githubRepoAddedBy":"user","ai_summary":"World models exhibit hallucinations in low-data regions of state-action space, which can be detected and mitigated using data-centric signals and coverage-aware sampling techniques.","ai_keywords":["world models","hallucination","state-action space","data-centric signals","coverage-aware sampling","curiosity rewards","data-efficient fine-tuning","visual world modeling","ground-truth actions","rewards"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":41,"organization":{"_id":"697e87d12cc19315a8497001","name":"UCSanDiego","fullname":"University of California at San Diego","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/697e8687c00f332cf492d29e/KUQpvngxP4r9oBSDZwIwZ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"677272184d148b904333e874","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/5dUau7gxLk4Wm1TiiJJri.jpeg","isPro":false,"fullname":"Efstathios Karypidis","user":"Sta8is","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"65dd2bc996c6e35c81276da2","avatarUrl":"/avatars/fff46ae33bd03c46528a05054065758a.svg","isPro":false,"fullname":"Louie Hong Yao","user":"ruyi101","type":"user"},{"_id":"6a2da6c8ca070ee12c6e396c","avatarUrl":"/avatars/0355287dcabaa67dbc7f0b10b87451f9.svg","isPro":false,"fullname":"Joe Mama","user":"JoeMama123123123","type":"user"},{"_id":"69f0bb9a53592156859aab90","avatarUrl":"/avatars/122aeb140c584b7842c50ae693c2a27e.svg","isPro":false,"fullname":"mini09999","user":"mini09999","type":"user"},{"_id":"675bcf5ed2cb9751b93e0c55","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/We9BV1HRYe-ZsuVzUDw1b.png","isPro":false,"fullname":"Darrien McKenzie","user":"darrienmckenzie","type":"user"},{"_id":"696da0962b3e2d9587d0b35d","avatarUrl":"/avatars/4f6c177ad51fb687ca1be75d18f6f5d6.svg","isPro":false,"fullname":"mini","user":"mini0999","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"697e87d12cc19315a8497001","name":"UCSanDiego","fullname":"University of California at San Diego","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/697e8687c00f332cf492d29e/KUQpvngxP4r9oBSDZwIwZ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.27326.md","query":{}}">
Hallucination in World Models is Predictable and Preventable
Abstract
World models exhibit hallucinations in low-data regions of state-action space, which can be detected and mitigated using data-centric signals and coverage-aware sampling techniques.
Modern generative world models render increasingly realistic action-controllable futures, yet they frequently hallucinate: rollouts remain visually fluent while drifting from the ground-truth dynamics. We hypothesize that hallucination concentrates in low-coverage regions of the state-action space, where lightweight data-centric signals can both detect it and guide mitigation. To test this, we introduce MMBench2, a 427-hour, 210-task dataset for visual world modeling with ground-truth actions, rewards, and live simulators, and train a 350M-parameter world model on it. We identify three distinct hallucination modes: perceptual, action-marginalized, and scene-diverging -- each anchored to a different stage of the pipeline, and develop three signals that accurately predict where the model will fail. To close coverage gaps at training time, we develop a coverage-aware sampling technique; to close them online, our hallucination predictors serve as curiosity rewards for targeted data collection, yielding a data-efficient finetuning recipe that adapts the pretrained world model to entirely unseen environments with as few as 50 real environment trajectories. Overall, our findings reveal that hallucination in world models is inherently a data coverage issue, and that the same signals used to detect it can also be used for mitigation.
An interactive web version of our paper is available at https://www.nicklashansen.com/mmbench2
Community
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.27326 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.27326 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.27326 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.