Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler–Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.</p>\n","updatedAt":"2026-05-15T01:39:55.820Z","author":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","fullname":"Yanzuo Lu","name":"oliveryanzuolu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png","fullname":"MVP Lab","name":"mvp-lab","type":"org","isHf":false,"details":"multi-modal foundation models"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8606978058815002},"editors":["oliveryanzuolu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png"],"reactions":[],"isReport":false}},{"id":"6a067b28cb4d596bdb8ea958","author":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","fullname":"Yanzuo Lu","name":"oliveryanzuolu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png","fullname":"MVP Lab","name":"mvp-lab","type":"org","isHf":false,"details":"multi-modal foundation models"}},"createdAt":"2026-05-15T01:47:20.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Model can be found at https://huggingface.co/mvp-lab/RAVEN .","html":"<p>Model can be found at <a href=\"https://huggingface.co/mvp-lab/RAVEN\">https://huggingface.co/mvp-lab/RAVEN</a> .</p>\n","updatedAt":"2026-05-15T01:47:20.331Z","author":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","fullname":"Yanzuo Lu","name":"oliveryanzuolu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png","fullname":"MVP Lab","name":"mvp-lab","type":"org","isHf":false,"details":"multi-modal foundation models"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7404057383537292},"editors":["oliveryanzuolu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.15190","authors":[{"_id":"6a0678eab1a8cbabc9f097fd","name":"Yanzuo Lu","hidden":false},{"_id":"6a0678eab1a8cbabc9f097fe","name":"Ronglai Zuo","hidden":false},{"_id":"6a0678eab1a8cbabc9f097ff","name":"Jiankang Deng","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6614cbd40bbea65e71db4e1f/B-Gshy3Yt3v_hT6wKVAP5.mp4"],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO","submittedOnDailyBy":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","isPro":false,"fullname":"Yanzuo Lu","user":"oliveryanzuolu","type":"user","name":"oliveryanzuolu"},"summary":"Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler-Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.","upvotes":6,"discussionId":"6a0678eab1a8cbabc9f09800","projectPage":"https://yanzuo.lu/raven/","githubRepo":"https://github.com/mvp-ai-lab/RAVEN","githubRepoAddedBy":"user","ai_summary":"RAVEN enables real-time video generation through causal autoregressive extrapolation with improved training alignment, while CM-GRPO enhances performance via reinforcement learning applied to consistency model sampling.","ai_keywords":["causal autoregressive video diffusion models","real-time streaming generation","video extrapolation","distillation","consistency models","reinforcement learning","Gaussian transition","Euler-Maruyama auxiliary process","self-rollout","interleaved sequence","denoising states","chunk losses","policy optimization"],"githubStars":24,"organization":{"_id":"68ef8b358cf07b80873af082","name":"mvp-lab","fullname":"MVP Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","isPro":false,"fullname":"Yanzuo Lu","user":"oliveryanzuolu","type":"user"},{"_id":"6381c5d63680a7cf34e08ca9","avatarUrl":"/avatars/731467e2d80d0ae163c4a00a9e3ff9e5.svg","isPro":false,"fullname":"
[email protected]","user":"wujie10","type":"user"},{"_id":"637f0eb22438d7485b8ef5d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/637f0eb22438d7485b8ef5d7/70h7dekqj7LuBobOXckmJ.jpeg","isPro":false,"fullname":"Ming Li","user":"limingcv","type":"user"},{"_id":"638edc6049de7ae552de7456","avatarUrl":"/avatars/14404ea07ee0753a576918177d26a29c.svg","isPro":false,"fullname":"Junyi Chen","user":"SOTAMak1r","type":"user"},{"_id":"64c269a52d73768f07ac266c","avatarUrl":"/avatars/d497a960f8aef6a974907b68ed750c1c.svg","isPro":false,"fullname":"Zhu Hongzhou","user":"zhuhz22","type":"user"},{"_id":"656ee8008bb9f4f8d95bd8f7","avatarUrl":"/avatars/4069d70f1279d928da521211c495d638.svg","isPro":false,"fullname":"Hyeonho Jeong","user":"hyeonho-jeong-video","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68ef8b358cf07b80873af082","name":"mvp-lab","fullname":"MVP Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.15190.md"}">
RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO
Abstract
RAVEN enables real-time video generation through causal autoregressive extrapolation with improved training alignment, while CM-GRPO enhances performance via reinforcement learning applied to consistency model sampling.
AI-generated summary
Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler-Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.
Community
Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler–Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.15190 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.15190 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.