Hugging Face Daily Papers · · 5 min read

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler–Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.</p>\n","updatedAt":"2026-05-15T01:39:55.820Z","author":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","fullname":"Yanzuo Lu","name":"oliveryanzuolu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png","fullname":"MVP Lab","name":"mvp-lab","type":"org","isHf":false,"details":"multi-modal foundation models"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8606978058815002},"editors":["oliveryanzuolu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png"],"reactions":[],"isReport":false}},{"id":"6a067b28cb4d596bdb8ea958","author":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","fullname":"Yanzuo Lu","name":"oliveryanzuolu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png","fullname":"MVP Lab","name":"mvp-lab","type":"org","isHf":false,"details":"multi-modal foundation models"}},"createdAt":"2026-05-15T01:47:20.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Model can be found at https://huggingface.co/mvp-lab/RAVEN .","html":"<p>Model can be found at <a href=\"https://huggingface.co/mvp-lab/RAVEN\">https://huggingface.co/mvp-lab/RAVEN</a> .</p>\n","updatedAt":"2026-05-15T01:47:20.331Z","author":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","fullname":"Yanzuo Lu","name":"oliveryanzuolu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png","fullname":"MVP Lab","name":"mvp-lab","type":"org","isHf":false,"details":"multi-modal foundation models"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7404057383537292},"editors":["oliveryanzuolu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.15190","authors":[{"_id":"6a0678eab1a8cbabc9f097fd","name":"Yanzuo Lu","hidden":false},{"_id":"6a0678eab1a8cbabc9f097fe","name":"Ronglai Zuo","hidden":false},{"_id":"6a0678eab1a8cbabc9f097ff","name":"Jiankang Deng","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6614cbd40bbea65e71db4e1f/B-Gshy3Yt3v_hT6wKVAP5.mp4"],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO","submittedOnDailyBy":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","isPro":false,"fullname":"Yanzuo Lu","user":"oliveryanzuolu","type":"user","name":"oliveryanzuolu"},"summary":"Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler-Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.","upvotes":6,"discussionId":"6a0678eab1a8cbabc9f09800","projectPage":"https://yanzuo.lu/raven/","githubRepo":"https://github.com/mvp-ai-lab/RAVEN","githubRepoAddedBy":"user","ai_summary":"RAVEN enables real-time video generation through causal autoregressive extrapolation with improved training alignment, while CM-GRPO enhances performance via reinforcement learning applied to consistency model sampling.","ai_keywords":["causal autoregressive video diffusion models","real-time streaming generation","video extrapolation","distillation","consistency models","reinforcement learning","Gaussian transition","Euler-Maruyama auxiliary process","self-rollout","interleaved sequence","denoising states","chunk losses","policy optimization"],"githubStars":24,"organization":{"_id":"68ef8b358cf07b80873af082","name":"mvp-lab","fullname":"MVP Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","isPro":false,"fullname":"Yanzuo Lu","user":"oliveryanzuolu","type":"user"},{"_id":"6381c5d63680a7cf34e08ca9","avatarUrl":"/avatars/731467e2d80d0ae163c4a00a9e3ff9e5.svg","isPro":false,"fullname":"[email protected]","user":"wujie10","type":"user"},{"_id":"637f0eb22438d7485b8ef5d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/637f0eb22438d7485b8ef5d7/70h7dekqj7LuBobOXckmJ.jpeg","isPro":false,"fullname":"Ming Li","user":"limingcv","type":"user"},{"_id":"638edc6049de7ae552de7456","avatarUrl":"/avatars/14404ea07ee0753a576918177d26a29c.svg","isPro":false,"fullname":"Junyi Chen","user":"SOTAMak1r","type":"user"},{"_id":"64c269a52d73768f07ac266c","avatarUrl":"/avatars/d497a960f8aef6a974907b68ed750c1c.svg","isPro":false,"fullname":"Zhu Hongzhou","user":"zhuhz22","type":"user"},{"_id":"656ee8008bb9f4f8d95bd8f7","avatarUrl":"/avatars/4069d70f1279d928da521211c495d638.svg","isPro":false,"fullname":"Hyeonho Jeong","user":"hyeonho-jeong-video","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68ef8b358cf07b80873af082","name":"mvp-lab","fullname":"MVP Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.15190.md"}">
Papers
arxiv:2605.15190

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Published on May 14
· Submitted by
Yanzuo Lu
on May 15
Authors:
,
,

Abstract

RAVEN enables real-time video generation through causal autoregressive extrapolation with improved training alignment, while CM-GRPO enhances performance via reinforcement learning applied to consistency model sampling.

AI-generated summary

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler-Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.

Community

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler–Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.15190
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.15190 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.15190 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers