Hugging Face Daily Papers · · 4 min read

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Project page: <a href=\"https://yihao-meng.github.io/CausalCine/\" rel=\"nofollow\">https://yihao-meng.github.io/CausalCine/</a></p>\n","updatedAt":"2026-05-13T02:05:28.661Z","author":{"_id":"656084f44e8918182d4f07c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/akAvCUCi7eR31PWOXrVPw.jpeg","fullname":"Yihao Meng","name":"Yhmeng1106","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.41350772976875305},"editors":["Yhmeng1106"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/akAvCUCi7eR31PWOXrVPw.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12496","authors":[{"_id":"6a03dc0586b054ce2fa40d2e","name":"Yihao Meng","hidden":false},{"_id":"6a03dc0586b054ce2fa40d2f","name":"Zichen Liu","hidden":false},{"_id":"6a03dc0586b054ce2fa40d30","name":"Hao Ouyang","hidden":false},{"_id":"6a03dc0586b054ce2fa40d31","name":"Qiuyu Wang","hidden":false},{"_id":"6a03dc0586b054ce2fa40d32","name":"Ka Leong Cheng","hidden":false},{"_id":"6a03dc0586b054ce2fa40d33","user":{"_id":"662128ec9ca2cd4e6db2fb44","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662128ec9ca2cd4e6db2fb44/uUg1V-pVfxT3mLuFgJuAN.jpeg","isPro":false,"fullname":"Bruce Yu","user":"bruceyyu","type":"user","name":"bruceyyu"},"name":"Yue Yu","status":"claimed_verified","statusLastChangedAt":"2026-05-13T07:51:06.090Z","hidden":false},{"_id":"6a03dc0586b054ce2fa40d34","name":"Hanlin Wang","hidden":false},{"_id":"6a03dc0586b054ce2fa40d35","name":"Haobo Li","hidden":false},{"_id":"6a03dc0586b054ce2fa40d36","name":"Jiapeng Zhu","hidden":false},{"_id":"6a03dc0586b054ce2fa40d37","name":"Yanhong Zeng","hidden":false},{"_id":"6a03dc0586b054ce2fa40d38","name":"Xing Zhu","hidden":false},{"_id":"6a03dc0586b054ce2fa40d39","name":"Yujun Shen","hidden":false},{"_id":"6a03dc0586b054ce2fa40d3a","name":"Qifeng Chen","hidden":false},{"_id":"6a03dc0586b054ce2fa40d3b","name":"Huamin Qu","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/656084f44e8918182d4f07c8/nIhe0C3LtpuH1YhrCtN0T.mp4"],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-13T00:00:00.000Z","title":"CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives","submittedOnDailyBy":{"_id":"656084f44e8918182d4f07c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/akAvCUCi7eR31PWOXrVPw.jpeg","isPro":false,"fullname":"Yihao Meng","user":"Yhmeng1106","type":"user","name":"Yhmeng1106"},"summary":"Autoregressive video generation aims at real-time, open-ended synthesis. Yet, cinematic storytelling is not merely the endless extension of a single scene; it requires progressing through evolving events, viewpoint shifts, and discrete shot boundaries. Existing autoregressive models often struggle in this setting. Trained primarily for short-horizon continuation, they treat long sequences as extended single shots, inevitably suffering from motion stagnation and semantic drift during long rollouts. To bridge this gap, we introduce CausalCine, an interactive autoregressive framework that transforms multi-shot video generation into an online directing process. CausalCine generates causally across shot changes, accepts dynamic prompts on the fly, and reuses context without regenerating previous shots. To achieve this, we first train a causal base model on native multi-shot sequences to learn complex shot transitions prior to acceleration. We then propose Content-Aware Memory Routing (CAMR), which dynamically retrieves historical KV entries according to attention-based relevance scores rather than temporal proximity, preserving cross-shot coherence under bounded active memory. Finally, we distill the causal base model into a few-step generator for real-time interactive generation. Extensive experiments demonstrate that CausalCine significantly outperforms autoregressive baselines and approaches the capability of bidirectional models while unlocking the streaming interactivity of causal generation. Demo available at https://yihao-meng.github.io/CausalCine/","upvotes":20,"discussionId":"6a03dc0686b054ce2fa40d3c","projectPage":"https://yihao-meng.github.io/CausalCine/","ai_summary":"CausalCine enables interactive, multi-shot video generation by addressing limitations of autoregressive models through causal modeling, dynamic memory routing, and real-time distillation techniques.","ai_keywords":["autoregressive video generation","causal modeling","multi-shot video generation","interactive generation","Content-Aware Memory Routing","causal base model","distillation","attention-based relevance scores","cross-shot coherence","real-time generation"],"organization":{"_id":"67c1d682826160b28f778510","name":"antgroup","fullname":"Ant Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/662e1f9da266499277937d33/7VcPHdLSGlged3ixK1dys.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"656084f44e8918182d4f07c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/akAvCUCi7eR31PWOXrVPw.jpeg","isPro":false,"fullname":"Yihao Meng","user":"Yhmeng1106","type":"user"},{"_id":"662128ec9ca2cd4e6db2fb44","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662128ec9ca2cd4e6db2fb44/uUg1V-pVfxT3mLuFgJuAN.jpeg","isPro":false,"fullname":"Bruce Yu","user":"bruceyyu","type":"user"},{"_id":"68a2fb20b3ad3d518527c49d","avatarUrl":"/avatars/cb37f5b2a98cbbea6f9726552a8f60e2.svg","isPro":false,"fullname":"zq","user":"capturee","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"64d4a858a768729ccfd1ac06","avatarUrl":"/avatars/96152cd450a53a35d656f37c3ed419cc.svg","isPro":false,"fullname":"Tianrui_Feng","user":"jerryfeng","type":"user"},{"_id":"64acd2ec39fcfebff8c79c00","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64acd2ec39fcfebff8c79c00/Avq66l5hO-aggNtk4Y1ss.png","isPro":false,"fullname":"Ka Leong Cheng","user":"felixcheng97","type":"user"},{"_id":"64981bea09cea550852652af","avatarUrl":"/avatars/df528e9008972c8e5ae4d278e617476c.svg","isPro":false,"fullname":"Qiuyu Wang","user":"qiuyuu","type":"user"},{"_id":"6478a982256b62e219917d67","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/PUJ-N2cQxgEmDGfyjajyA.jpeg","isPro":false,"fullname":"JingyeChen22","user":"JingyeChen22","type":"user"},{"_id":"68489221863b75bfb2097c8e","avatarUrl":"/avatars/e3d78fea4ca2e1ddb9e4bb03cb452929.svg","isPro":false,"fullname":"sigma","user":"sigma7863","type":"user"},{"_id":"655c2cc7e8a8971e89aa6e75","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655c2cc7e8a8971e89aa6e75/rWQEbTqgq7MLEJJGfjP5y.jpeg","isPro":true,"fullname":"LIU Zichen","user":"LiuZichen","type":"user"},{"_id":"6617f604625ae15f6464113e","avatarUrl":"/avatars/e8f86a7c27bd0f0561fce4046d9e4c0a.svg","isPro":false,"fullname":"Yichong Lu","user":"Louischong","type":"user"},{"_id":"63f0baf66309c84d5f4a2226","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f0baf66309c84d5f4a2226/ihOgtwseRkfP1t-60IgyT.jpeg","isPro":true,"fullname":"Qingyan","user":"QingyanBai","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67c1d682826160b28f778510","name":"antgroup","fullname":"Ant Group","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/662e1f9da266499277937d33/7VcPHdLSGlged3ixK1dys.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12496.md"}">
Papers
arxiv:2605.12496

CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives

Published on May 12
· Submitted by
Yihao Meng
on May 13
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

CausalCine enables interactive, multi-shot video generation by addressing limitations of autoregressive models through causal modeling, dynamic memory routing, and real-time distillation techniques.

AI-generated summary

Autoregressive video generation aims at real-time, open-ended synthesis. Yet, cinematic storytelling is not merely the endless extension of a single scene; it requires progressing through evolving events, viewpoint shifts, and discrete shot boundaries. Existing autoregressive models often struggle in this setting. Trained primarily for short-horizon continuation, they treat long sequences as extended single shots, inevitably suffering from motion stagnation and semantic drift during long rollouts. To bridge this gap, we introduce CausalCine, an interactive autoregressive framework that transforms multi-shot video generation into an online directing process. CausalCine generates causally across shot changes, accepts dynamic prompts on the fly, and reuses context without regenerating previous shots. To achieve this, we first train a causal base model on native multi-shot sequences to learn complex shot transitions prior to acceleration. We then propose Content-Aware Memory Routing (CAMR), which dynamically retrieves historical KV entries according to attention-based relevance scores rather than temporal proximity, preserving cross-shot coherence under bounded active memory. Finally, we distill the causal base model into a few-step generator for real-time interactive generation. Extensive experiments demonstrate that CausalCine significantly outperforms autoregressive baselines and approaches the capability of bidirectional models while unlocking the streaming interactivity of causal generation. Demo available at https://yihao-meng.github.io/CausalCine/

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.12496
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.12496 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.12496 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.12496 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers