Hugging Face Daily Papers · · 4 min read

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Introducing Echo-Infinity 🚀</p>\n<p>An autoregressive framework that moves toward real-time infinite video generation.</p>\n<p>Echo-Infinity is powered by two simple but effective recipes:</p>\n<p>✨ an evolving learnable memory query<br>✨ unified relative RoPE across training and inference</p>\n<p>Check out our 24-hour real-time generation demo!</p>\n","updatedAt":"2026-06-04T02:15:22.093Z","author":{"_id":"650447dd52ca06fef957f05d","avatarUrl":"/avatars/511c11ac9b3cc7a162bda5e07f6ee0a3.svg","fullname":"Yuxuan BIAN","name":"BianYx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8466810584068298},"editors":["BianYx"],"editorAvatarUrls":["/avatars/511c11ac9b3cc7a162bda5e07f6ee0a3.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.04527","authors":[{"_id":"6a20ccdd15100c5272a845e8","name":"Yuxuan Bian","hidden":false},{"_id":"6a20ccdd15100c5272a845e9","name":"Zeyue Xue","hidden":false},{"_id":"6a20ccdd15100c5272a845ea","name":"Songchun Zhang","hidden":false},{"_id":"6a20ccdd15100c5272a845eb","name":"Shiyi Zhang","hidden":false},{"_id":"6a20ccdd15100c5272a845ec","name":"Weiyang Jin","hidden":false},{"_id":"6a20ccdd15100c5272a845ed","name":"Yaowei Li","hidden":false},{"_id":"6a20ccdd15100c5272a845ee","name":"Junhao Zhuang","hidden":false},{"_id":"6a20ccdd15100c5272a845ef","name":"Haoran Li","hidden":false},{"_id":"6a20ccdd15100c5272a845f0","name":"Jie Huang","hidden":false},{"_id":"6a20ccdd15100c5272a845f1","name":"Haoyang Huang","hidden":false},{"_id":"6a20ccdd15100c5272a845f2","name":"Nan Duan","hidden":false},{"_id":"6a20ccdd15100c5272a845f3","name":"Qiang Xu","hidden":false}],"publishedAt":"2026-06-03T00:00:00.000Z","submittedOnDailyAt":"2026-06-04T00:00:00.000Z","title":"Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation","submittedOnDailyBy":{"_id":"650447dd52ca06fef957f05d","avatarUrl":"/avatars/511c11ac9b3cc7a162bda5e07f6ee0a3.svg","isPro":false,"fullname":"Yuxuan BIAN","user":"BianYx","type":"user","name":"BianYx"},"summary":"We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules, fixed-ratio heuristic compression, or inference-time RoPE adaptation. These designs inevitably lose historical information and amplify compounding errors due to their limited cache window and ignorance of autoregressive generation noise. Inspired by human memory consolidation, Echo-Infinity replaces handcrafted memory curation with learnable Memory Query, which are updated by attention and a gating mechanism when past frames are evicted from the local window. The queries are optimized end-to-end with the video diffusion transformers (DiTs), forming an evolving memory that supports arbitrary compression ratios with constant computation independent of video length. They also act as a generalizable generation prior, improving quality even when only the optimized initial state is used. We further introduce Unified Relative RoPE Recipe, which anchors the sink frames to start from id 0 and lets the newest frame id grow at most to the DiTs' pretrained maximum temporal RoPE id throughout training and inference, freeing the model from the finite RoPE constraint and closing the train-test RoPE extrapolation gap. In long and short video generation, Echo-Infinity achieves state-of-the-art performance, and, to our knowledge, demonstrates promising 24-hour (>1.3 M frames) real-time rollouts for the first time, suggesting a practical path toward infinite video generation.","upvotes":19,"discussionId":"6a20ccdd15100c5272a845f4","projectPage":"https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/","githubRepo":"https://github.com/Echo-Team-Joy-Future-Academy-JD/Echo-Infinity","githubRepoAddedBy":"user","ai_summary":"Echo Infinity enables real-time infinite video generation using learnable evolving memory and unified relative RoPE to overcome limitations in existing autoregressive methods.","ai_keywords":["autoregressive framework","evolving memory","video diffusion transformers","DiTs","Memory Query","attention mechanism","gating mechanism","relative RoPE","RoPE constraint","train-test RoPE extrapolation gap"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":17},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"650447dd52ca06fef957f05d","avatarUrl":"/avatars/511c11ac9b3cc7a162bda5e07f6ee0a3.svg","isPro":false,"fullname":"Yuxuan BIAN","user":"BianYx","type":"user"},{"_id":"6411c801e872ae3fb1e2c96e","avatarUrl":"/avatars/f8898dc13d700e545eedbbfab1c18353.svg","isPro":true,"fullname":"Franklin","user":"Franklinzhang","type":"user"},{"_id":"6362801380c1a705a6ea54ac","avatarUrl":"/avatars/041ad5abf9be42e336938f51ebb8746c.svg","isPro":false,"fullname":"Yaowei Li","user":"Yw22","type":"user"},{"_id":"63721f5ada3183d9d53cfe1f","avatarUrl":"/avatars/593c14c907848da7dbc9e5418751bd94.svg","isPro":false,"fullname":"Xue Zeyue","user":"xzyhku","type":"user"},{"_id":"646eac510867c99c2d3fde08","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646eac510867c99c2d3fde08/fvIiW7zj4aTNbp16kTNBA.jpeg","isPro":false,"fullname":"Yaofeng Su","user":"Exploration","type":"user"},{"_id":"67ebdaf5810654f723634185","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/R5zVT6lixfgKRRrn2bRiQ.jpeg","isPro":false,"fullname":"Cat","user":"GelerCAT","type":"user"},{"_id":"67344a21db744d70cb9be933","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Z9Eh-asE3ZISNGXOTzTFQ.png","isPro":false,"fullname":"Haoyu Wang","user":"why986","type":"user"},{"_id":"659765e22235d4056ba80c0a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/659765e22235d4056ba80c0a/dATESmijLO3CpD1sCMezg.jpeg","isPro":true,"fullname":"Gao Sensen","user":"Sensen02","type":"user"},{"_id":"66608add236f958513d21d2e","avatarUrl":"/avatars/53eca0891c98cbb93be899885160a983.svg","isPro":false,"fullname":"Weiyang Jin","user":"Wayne-King","type":"user"},{"_id":"6507fbecffc738079ca592bf","avatarUrl":"/avatars/1cb0f39ac6dc2dba2292846a8d7746da.svg","isPro":false,"fullname":"Ming Chen","user":"ChenMing-thu14","type":"user"},{"_id":"64970d3d9c3b29dca8633f87","avatarUrl":"/avatars/11e3c9c66d28490d6d09925f9aa47cd1.svg","isPro":false,"fullname":"JunhaoZhuang","user":"JunhaoZhuang","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0}">
Papers
arxiv:2606.04527

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Published on Jun 3
· Submitted by
Yuxuan BIAN
on Jun 4
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

Echo Infinity enables real-time infinite video generation using learnable evolving memory and unified relative RoPE to overcome limitations in existing autoregressive methods.

We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules, fixed-ratio heuristic compression, or inference-time RoPE adaptation. These designs inevitably lose historical information and amplify compounding errors due to their limited cache window and ignorance of autoregressive generation noise. Inspired by human memory consolidation, Echo-Infinity replaces handcrafted memory curation with learnable Memory Query, which are updated by attention and a gating mechanism when past frames are evicted from the local window. The queries are optimized end-to-end with the video diffusion transformers (DiTs), forming an evolving memory that supports arbitrary compression ratios with constant computation independent of video length. They also act as a generalizable generation prior, improving quality even when only the optimized initial state is used. We further introduce Unified Relative RoPE Recipe, which anchors the sink frames to start from id 0 and lets the newest frame id grow at most to the DiTs' pretrained maximum temporal RoPE id throughout training and inference, freeing the model from the finite RoPE constraint and closing the train-test RoPE extrapolation gap. In long and short video generation, Echo-Infinity achieves state-of-the-art performance, and, to our knowledge, demonstrates promising 24-hour (>1.3 M frames) real-time rollouts for the first time, suggesting a practical path toward infinite video generation.

Community

Paper submitter about 7 hours ago

Introducing Echo-Infinity 🚀

An autoregressive framework that moves toward real-time infinite video generation.

Echo-Infinity is powered by two simple but effective recipes:

✨ an evolving learnable memory query
✨ unified relative RoPE across training and inference

Check out our 24-hour real-time generation demo!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.04527 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.04527 in a Space README.md to link it from this page.

Collections including this paper 1

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers