Hugging Face Daily Papers · May 15, 2026 · 5 min read

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler–Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.</p>\n","updatedAt":"2026-05-15T01:39:55.820Z","author":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","fullname":"Yanzuo Lu","name":"oliveryanzuolu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png","fullname":"MVP Lab","name":"mvp-lab","type":"org","isHf":false,"details":"multi-modal foundation models"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8606978058815002},"editors":["oliveryanzuolu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png"],"reactions":[],"isReport":false}},{"id":"6a067b28cb4d596bdb8ea958","author":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","fullname":"Yanzuo Lu","name":"oliveryanzuolu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png","fullname":"MVP Lab","name":"mvp-lab","type":"org","isHf":false,"details":"multi-modal foundation models"}},"createdAt":"2026-05-15T01:47:20.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Model can be found at https://huggingface.co/mvp-lab/RAVEN .","html":"<p>Model can be found at <a href=\"https://huggingface.co/mvp-lab/RAVEN\">https://huggingface.co/mvp-lab/RAVEN</a> .</p>\n","updatedAt":"2026-05-15T01:47:20.331Z","author":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","fullname":"Yanzuo Lu","name":"oliveryanzuolu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":10,"isUserFollowing":false,"primaryOrg":{"avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png","fullname":"MVP Lab","name":"mvp-lab","type":"org","isHf":false,"details":"multi-modal foundation models"}}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7404057383537292},"editors":["oliveryanzuolu"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.15190","authors":[{"_id":"6a0678eab1a8cbabc9f097fd","name":"Yanzuo Lu","hidden":false},{"_id":"6a0678eab1a8cbabc9f097fe","name":"Ronglai Zuo","hidden":false},{"_id":"6a0678eab1a8cbabc9f097ff","name":"Jiankang Deng","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6614cbd40bbea65e71db4e1f/B-Gshy3Yt3v_hT6wKVAP5.mp4"],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO","submittedOnDailyBy":{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","isPro":false,"fullname":"Yanzuo Lu","user":"oliveryanzuolu","type":"user","name":"oliveryanzuolu"},"summary":"Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler-Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.","upvotes":6,"discussionId":"6a0678eab1a8cbabc9f09800","projectPage":"https://yanzuo.lu/raven/","githubRepo":"https://github.com/mvp-ai-lab/RAVEN","githubRepoAddedBy":"user","ai_summary":"RAVEN enables real-time video generation through causal autoregressive extrapolation with improved training alignment, while CM-GRPO enhances performance via reinforcement learning applied to consistency model sampling.","ai_keywords":["causal autoregressive video diffusion models","real-time streaming generation","video extrapolation","distillation","consistency models","reinforcement learning","Gaussian transition","Euler-Maruyama auxiliary process","self-rollout","interleaved sequence","denoising states","chunk losses","policy optimization"],"githubStars":24,"organization":{"_id":"68ef8b358cf07b80873af082","name":"mvp-lab","fullname":"MVP Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6614cbd40bbea65e71db4e1f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6614cbd40bbea65e71db4e1f/zBdyquSomXckwanQmJJuD.png","isPro":false,"fullname":"Yanzuo Lu","user":"oliveryanzuolu","type":"user"},{"_id":"6381c5d63680a7cf34e08ca9","avatarUrl":"/avatars/731467e2d80d0ae163c4a00a9e3ff9e5.svg","isPro":false,"fullname":"[email protected]","user":"wujie10","type":"user"},{"_id":"637f0eb22438d7485b8ef5d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/637f0eb22438d7485b8ef5d7/70h7dekqj7LuBobOXckmJ.jpeg","isPro":false,"fullname":"Ming Li","user":"limingcv","type":"user"},{"_id":"638edc6049de7ae552de7456","avatarUrl":"/avatars/14404ea07ee0753a576918177d26a29c.svg","isPro":false,"fullname":"Junyi Chen","user":"SOTAMak1r","type":"user"},{"_id":"64c269a52d73768f07ac266c","avatarUrl":"/avatars/d497a960f8aef6a974907b68ed750c1c.svg","isPro":false,"fullname":"Zhu Hongzhou","user":"zhuhz22","type":"user"},{"_id":"656ee8008bb9f4f8d95bd8f7","avatarUrl":"/avatars/4069d70f1279d928da521211c495d638.svg","isPro":false,"fullname":"Hyeonho Jeong","user":"hyeonho-jeong-video","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68ef8b358cf07b80873af082","name":"mvp-lab","fullname":"MVP Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cc7a38376917c0223dd24b/NB9OJrXwrTI5wgESeLN9F.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.15190.md"}">

Papers

arxiv:2605.15190

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Published on May 14

· Submitted by

Yanzuo Lu on May 15

MVP Lab

Upvote

Authors:

Abstract

RAVEN enables real-time video generation through causal autoregressive extrapolation with improved training alignment, while CM-GRPO enhances performance via reinforcement learning applied to consistency model sampling.

AI-generated summary

View arXiv page View PDF Project page GitHub 24 Add to collection

Community

oliveryanzuolu

Paper submitter about 23 hours ago

oliveryanzuolu

Paper submitter about 23 hours ago

Model can be found at https://huggingface.co/mvp-lab/RAVEN .

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.15190

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.15190 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.15190 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

No comments yet. Sign in and be the first to say something.

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Abstract

Community

Models citing this paper 1

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1

Discussion (0)

More from Hugging Face Daily Papers