Hugging Face Daily Papers · · 6 min read

CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

An intervention-based benchmark designed to evaluate counterfactual physical consistency: whether a model’s predictions of physical events respond appropriately to controlled changes in the visual input, such as variations of scene context, viewpoint, object appearance, and object category.</p>\n","updatedAt":"2026-05-26T08:22:36.620Z","author":{"_id":"632227fc48b40e273b5e9bdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/632227fc48b40e273b5e9bdd/Lw1VIRy_moOqYqZOOvSiH.jpeg","fullname":"Artur Jesslen","name":"Arturjssln","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8657201528549194},"editors":["Arturjssln"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/632227fc48b40e273b5e9bdd/Lw1VIRy_moOqYqZOOvSiH.jpeg"],"reactions":[],"isReport":false}},{"id":"6a15fee8f0eaf67dc7f351d5","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false},"createdAt":"2026-05-26T20:13:28.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"the full-factorial intervention design in CRONOS is where the paper shines, isolating viewpoint, scene context, object appearance, and object category while keeping the core event fixed. but i worry the object-centric metrics like appearance stability or 3d-shape stability can be steered by textures or backgrounds, so a model might pass the test by rendering plausible visuals across views rather than truly modeling the physics. i'd like to see a stricter causal test, where you perturb motion cues or disable texture signals and check if the predicted outcomes still align with physical laws under those counterfactuals. the arxivlens breakdown helped me parse the method details, and it's nice how it distills section-by-section insights (https://arxivlens.com/PaperView/Details/cronos-benchmarking-counterfactual-physical-consistency-in-video-models-2068-b784eae8). a concrete next step could be a counterfactual that swaps viewpoint but fixes lighting and texture, to see if the model truly generalizes dynamics beyond appearance.","html":"<p>the full-factorial intervention design in CRONOS is where the paper shines, isolating viewpoint, scene context, object appearance, and object category while keeping the core event fixed. but i worry the object-centric metrics like appearance stability or 3d-shape stability can be steered by textures or backgrounds, so a model might pass the test by rendering plausible visuals across views rather than truly modeling the physics. i'd like to see a stricter causal test, where you perturb motion cues or disable texture signals and check if the predicted outcomes still align with physical laws under those counterfactuals. the arxivlens breakdown helped me parse the method details, and it's nice how it distills section-by-section insights (<a href=\"https://arxivlens.com/PaperView/Details/cronos-benchmarking-counterfactual-physical-consistency-in-video-models-2068-b784eae8\" rel=\"nofollow\">https://arxivlens.com/PaperView/Details/cronos-benchmarking-counterfactual-physical-consistency-in-video-models-2068-b784eae8</a>). a concrete next step could be a counterfactual that swaps viewpoint but fixes lighting and texture, to see if the model truly generalizes dynamics beyond appearance.</p>\n","updatedAt":"2026-05-26T20:13:28.438Z","author":{"_id":"65243980050781c16f234f1f","avatarUrl":"/avatars/743a009681d5d554c27e04300db9f267.svg","fullname":"Avi","name":"avahal","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8527558445930481},"editors":["avahal"],"editorAvatarUrls":["/avatars/743a009681d5d554c27e04300db9f267.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.23699","authors":[{"_id":"6a155142b57a1823d5708d92","name":"León Begiristain","hidden":false},{"_id":"6a155142b57a1823d5708d93","name":"Olaf Dünkel","hidden":false},{"_id":"6a155142b57a1823d5708d94","name":"Adam Kortylewski","hidden":false}],"publishedAt":"2026-05-22T00:00:00.000Z","submittedOnDailyAt":"2026-05-26T00:00:00.000Z","title":"CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models","submittedOnDailyBy":{"_id":"632227fc48b40e273b5e9bdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/632227fc48b40e273b5e9bdd/Lw1VIRy_moOqYqZOOvSiH.jpeg","isPro":false,"fullname":"Artur Jesslen","user":"Arturjssln","type":"user","name":"Arturjssln"},"summary":"Video prediction is increasingly viewed as a path toward generalizable world models, yet it remains unclear whether these systems learn underlying causal structure or merely exploit superficial visual correlations for future prediction. We introduce CRONOS, an intervention-based benchmark designed to evaluate counterfactual physical consistency: whether a model's predictions of physical events respond appropriately to controlled changes in the visual input, such as variations of scene context, viewpoint, object appearance, and object category. Built in a photorealistic Unreal Engine environment, CRONOS enables controlled, high-fidelity generation of videos across diverse scenes and dynamics. In contrast to previous benchmarks, CRONOS systematically intervenes on four key factors - viewpoint, scene, object category, and object appearance - while keeping the underlying physical event type, such as a collision, occlusion, or fall, fixed. Our evaluation of recent open-source video generators reveals substantial failures in counterfactual physical consistency: prediction quality for the same physical event type is affected by appearance, environment, and, particularly by viewpoint changes. CRONOS provides a controlled and reproducible testbed for diagnosing how the quality of generated videos changes for different interventions, establishing a concrete target for developing models that perform consistently across changes of multiple conditions. The dataset and code are available at our project page.","upvotes":5,"discussionId":"6a155142b57a1823d5708d95","projectPage":"https://genintel.github.io/CRONOS/","githubRepo":"https://github.com/GenIntel/CRONOS-benchmark","githubRepoAddedBy":"user","ai_summary":"CRONOS is a benchmark for evaluating counterfactual physical consistency in video prediction models through controlled interventions in viewpoint, scene, object category, and appearance while maintaining fixed physical event types.","ai_keywords":["video prediction","counterfactual physical consistency","intervention-based benchmark","photorealistic Unreal Engine environment","video generators","physical event type","video generation"],"githubStars":3},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"632227fc48b40e273b5e9bdd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/632227fc48b40e273b5e9bdd/Lw1VIRy_moOqYqZOOvSiH.jpeg","isPro":false,"fullname":"Artur Jesslen","user":"Arturjssln","type":"user"},{"_id":"692432f01f66351978f150b3","avatarUrl":"/avatars/9e09c73f54123c732368b059e08e3daf.svg","isPro":false,"fullname":"Leon Begiristain Ribo","user":"Lehoi","type":"user"},{"_id":"659be665654fe4eb0a5d3ffb","avatarUrl":"/avatars/3f4a264f99c33e339c036d48df58ba1d.svg","isPro":false,"fullname":"sommerl","user":"limpbot","type":"user"},{"_id":"6a0ad49529c6a75e98eddb0f","avatarUrl":"/avatars/9fd3d667c7b07b015c2ac96eaa2f332a.svg","isPro":false,"fullname":"Generative Intelligence Lab","user":"genintel","type":"user"},{"_id":"65ddd4fe54a95b60de6af7d6","avatarUrl":"/avatars/f0f4af777662d3fad62b70d0e185a116.svg","isPro":false,"fullname":"Olaf Dünkel","user":"odunkel","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.23699.md"}">
Papers
arxiv:2605.23699

CRONOS: Benchmarking Counterfactual Physical Consistency in Video Models

Published on May 22
· Submitted by
Artur Jesslen
on May 26
Authors:
,
,

Abstract

CRONOS is a benchmark for evaluating counterfactual physical consistency in video prediction models through controlled interventions in viewpoint, scene, object category, and appearance while maintaining fixed physical event types.

AI-generated summary

Video prediction is increasingly viewed as a path toward generalizable world models, yet it remains unclear whether these systems learn underlying causal structure or merely exploit superficial visual correlations for future prediction. We introduce CRONOS, an intervention-based benchmark designed to evaluate counterfactual physical consistency: whether a model's predictions of physical events respond appropriately to controlled changes in the visual input, such as variations of scene context, viewpoint, object appearance, and object category. Built in a photorealistic Unreal Engine environment, CRONOS enables controlled, high-fidelity generation of videos across diverse scenes and dynamics. In contrast to previous benchmarks, CRONOS systematically intervenes on four key factors - viewpoint, scene, object category, and object appearance - while keeping the underlying physical event type, such as a collision, occlusion, or fall, fixed. Our evaluation of recent open-source video generators reveals substantial failures in counterfactual physical consistency: prediction quality for the same physical event type is affected by appearance, environment, and, particularly by viewpoint changes. CRONOS provides a controlled and reproducible testbed for diagnosing how the quality of generated videos changes for different interventions, establishing a concrete target for developing models that perform consistently across changes of multiple conditions. The dataset and code are available at our project page.

Community

Paper submitter about 17 hours ago

An intervention-based benchmark designed to evaluate counterfactual physical consistency: whether a model’s predictions of physical events respond appropriately to controlled changes in the visual input, such as variations of scene context, viewpoint, object appearance, and object category.

the full-factorial intervention design in CRONOS is where the paper shines, isolating viewpoint, scene context, object appearance, and object category while keeping the core event fixed. but i worry the object-centric metrics like appearance stability or 3d-shape stability can be steered by textures or backgrounds, so a model might pass the test by rendering plausible visuals across views rather than truly modeling the physics. i'd like to see a stricter causal test, where you perturb motion cues or disable texture signals and check if the predicted outcomes still align with physical laws under those counterfactuals. the arxivlens breakdown helped me parse the method details, and it's nice how it distills section-by-section insights (https://arxivlens.com/PaperView/Details/cronos-benchmarking-counterfactual-physical-consistency-in-video-models-2068-b784eae8). a concrete next step could be a counterfactual that swaps viewpoint but fixes lighting and texture, to see if the model truly generalizes dynamics beyond appearance.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.23699
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.23699 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.23699 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers