Hugging Face Daily Papers · May 21, 2026 · 3 min read

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Train-Free Infinite-Frame Generation for Consistent Long Videos (ICML26 )</p>\n","updatedAt":"2026-05-21T10:04:10.344Z","author":{"_id":"66d255e3947594430c723ff6","avatarUrl":"/avatars/c56e4792332a01bf34085a75ee64916e.svg","fullname":"xiaochonglinghu","name":"xiaochonglinghu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.535102128982544},"editors":["xiaochonglinghu"],"editorAvatarUrls":["/avatars/c56e4792332a01bf34085a75ee64916e.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.18233","authors":[{"_id":"6a0ed65533b630fff3b43c15","name":"X. Feng","hidden":false},{"_id":"6a0ed65533b630fff3b43c16","name":"J. Zhu","hidden":false},{"_id":"6a0ed65533b630fff3b43c17","name":"M. Wu","hidden":false},{"_id":"6a0ed65533b630fff3b43c18","name":"C. Chen","hidden":false},{"_id":"6a0ed65533b630fff3b43c19","name":"F. Mao","hidden":false},{"_id":"6a0ed65533b630fff3b43c1a","name":"H. Guo","hidden":false},{"_id":"6a0ed65533b630fff3b43c1b","name":"J. Wu","hidden":false},{"_id":"6a0ed65533b630fff3b43c1c","name":"X. Chu","hidden":false},{"_id":"6a0ed65533b630fff3b43c1d","name":"K. Huang","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/66d255e3947594430c723ff6/bSIgHEPP94mwhttRdMKpf.mp4"],"publishedAt":"2026-05-18T00:00:00.000Z","submittedOnDailyAt":"2026-05-21T00:00:00.000Z","title":"Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos","submittedOnDailyBy":{"_id":"66d255e3947594430c723ff6","avatarUrl":"/avatars/c56e4792332a01bf34085a75ee64916e.svg","isPro":false,"fullname":"xiaochonglinghu","user":"xiaochonglinghu","type":"user","name":"xiaochonglinghu"},"summary":"Without incurring significant computational overhead, train-free long video generation aims to enable foundation video generation models to produce longer videos. Frame-level autoregressive frameworks, e.g., FIFO-diffusion, offer the advantage of generating infinitely long videos with constant memory consumption. However, the mismatch between training and inference, coupled with the challenge of maintaining long-term consistency, limits the effective utilization of foundation models. To mitigate these concerns, we propose MIGA, a novel infinite-frame long video generation method. Firstly, we propose an effective two-stage alignment mechanism that mitigates the training-inference gap by reducing the excessive noise span fed to the model. We then introduce an innovative dual consistency enhancement mechanism, where the self-reflection approach corrects early high-noise frames and the long-range frame guidance approach leverages later low-noise frames with broad coverage to steer generation, jointly improving temporal consistency. Extensive experiments on VBench and NarrLV demonstrate the state-of-the-art performance of MIGA. Our project page is available at https://xiaokunfeng.github.io/miga_homepage/.","upvotes":70,"discussionId":"6a0ed65633b630fff3b43c1e","projectPage":"https://xiaokunfeng.github.io/miga_homepage/","ai_summary":"MIGA addresses long video generation challenges by reducing training-inference gaps and enhancing temporal consistency through dual consistency mechanisms.","ai_keywords":["frame-level autoregressive frameworks","FIFO-diffusion","training-inference gap","noise span","self-reflection approach","long-range frame guidance","temporal consistency"],"organization":{"_id":"64488b334988ee01f2a8d856","name":"alibaba-inc","fullname":"alibaba-inc","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/MX4wxQVaFm1A1wqnrL2WU.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67d116c47be76de1a40873ca","avatarUrl":"/avatars/d33a689b4a95709e50458a7163e0691d.svg","isPro":false,"fullname":"AMAP-ML","user":"AMAP-ML","type":"user"},{"_id":"65befe01c1a44b6ef15e9721","avatarUrl":"/avatars/928dd46626c3780b5af586473d0a5dc1.svg","isPro":false,"fullname":"T","user":"Rookienovice","type":"user"},{"_id":"64904c353be5db53615bd38a","avatarUrl":"/avatars/44296f0155fef0833aaf79201b5e344b.svg","isPro":false,"fullname":"chen zhihao","user":"mrbug","type":"user"},{"_id":"656c9cfef7be0986b49934ea","avatarUrl":"/avatars/2030e77c28fb4c518b692cd9a20de665.svg","isPro":false,"fullname":"Zengbin Wang","user":"MuMing0102","type":"user"},{"_id":"650758da9622235d7dcba97e","avatarUrl":"/avatars/258802da8dfe3182e7f57288d6249f09.svg","isPro":false,"fullname":"Jianhao Zeng","user":"JianhaoZeng","type":"user"},{"_id":"6682775501c30ad93ec5e500","avatarUrl":"/avatars/971ee2028589f6089559306b40a58da0.svg","isPro":false,"fullname":"Jiashu Zhu","user":"Jiashuz","type":"user"},{"_id":"648c6537aeff9347218f49f2","avatarUrl":"/avatars/1891855926eec77f91a389755998212f.svg","isPro":false,"fullname":"Jiachen Lei","user":"jiachenlei","type":"user"},{"_id":"661de9defdbc9c247f159d15","avatarUrl":"/avatars/38e21e78327cc908201122405c48f41b.svg","isPro":false,"fullname":"Rui Dai","user":"DerryD","type":"user"},{"_id":"673c09d251d8d86ed0e4b343","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/9p0jpsSw_HYINDHsKasDW.png","isPro":false,"fullname":"guo","user":"sigma28","type":"user"},{"_id":"6773bcaa675a971ddf1e81dd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/a8VUwZYXd7O_mq_zFvXMh.png","isPro":false,"fullname":"CokeWang","user":"CokeWang","type":"user"},{"_id":"682edd59e9980508c9bd7598","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/1Pnwa8CBYucF734wBLJRv.png","isPro":false,"fullname":"tangcui","user":"christine8664","type":"user"},{"_id":"64906d6d1afdee3acd06ad1a","avatarUrl":"/avatars/4f1978299a93411866f74b4ddd2ef569.svg","isPro":false,"fullname":"Tian Meng","user":"rusuanjun","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":3,"organization":{"_id":"64488b334988ee01f2a8d856","name":"alibaba-inc","fullname":"alibaba-inc","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/61ac8f8a00d01045fca0ad2f/MX4wxQVaFm1A1wqnrL2WU.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.18233.md"}">

Papers

arxiv:2605.18233

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Published on May 18

· Submitted by

xiaochonglinghu on May 21

#3 Paper of the day

alibaba-inc

Upvote

Authors:

Abstract

MIGA addresses long video generation challenges by reducing training-inference gaps and enhancing temporal consistency through dual consistency mechanisms.

AI-generated summary

Without incurring significant computational overhead, train-free long video generation aims to enable foundation video generation models to produce longer videos. Frame-level autoregressive frameworks, e.g., FIFO-diffusion, offer the advantage of generating infinitely long videos with constant memory consumption. However, the mismatch between training and inference, coupled with the challenge of maintaining long-term consistency, limits the effective utilization of foundation models. To mitigate these concerns, we propose MIGA, a novel infinite-frame long video generation method. Firstly, we propose an effective two-stage alignment mechanism that mitigates the training-inference gap by reducing the excessive noise span fed to the model. We then introduce an innovative dual consistency enhancement mechanism, where the self-reflection approach corrects early high-noise frames and the long-range frame guidance approach leverages later low-noise frames with broad coverage to steer generation, jointly improving temporal consistency. Extensive experiments on VBench and NarrLV demonstrate the state-of-the-art performance of MIGA. Our project page is available at https://xiaokunfeng.github.io/miga_homepage/.

View arXiv page View PDF Project page Add to collection

Community

xiaochonglinghu

Paper submitter about 3 hours ago

Train-Free Infinite-Frame Generation for Consistent Long Videos (ICML26 )

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.18233

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.18233 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.18233 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.18233 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers