DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory</p>\n","updatedAt":"2026-06-01T09:37:24.887Z","author":{"_id":"661a59ff8858a270e6ad4481","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/3XZ0X-7HCjaw_0PpYM3Pz.png","fullname":"Zhenhao Yang","name":"JeffreyYzh","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7063995003700256},"editors":["JeffreyYzh"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/3XZ0X-7HCjaw_0PpYM3Pz.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.31336","authors":[{"_id":"6a1d297a808ddbc3c7d4367e","user":{"_id":"661a59ff8858a270e6ad4481","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/3XZ0X-7HCjaw_0PpYM3Pz.png","isPro":false,"fullname":"Zhenhao Yang","user":"JeffreyYzh","type":"user","name":"JeffreyYzh"},"name":"Zhenhao Yang","status":"claimed_verified","statusLastChangedAt":"2026-06-01T09:32:06.774Z","hidden":false},{"_id":"6a1d297a808ddbc3c7d4367f","name":"Xiaoshi Wu","hidden":false},{"_id":"6a1d297a808ddbc3c7d43680","name":"Zhengyao Lv","hidden":false},{"_id":"6a1d297a808ddbc3c7d43681","name":"Xiaoyu Shi","hidden":false},{"_id":"6a1d297a808ddbc3c7d43682","name":"Xintao Wang","hidden":false},{"_id":"6a1d297a808ddbc3c7d43683","name":"Pengfei Wan","hidden":false},{"_id":"6a1d297a808ddbc3c7d43684","name":"Kun Gai","hidden":false},{"_id":"6a1d297a808ddbc3c7d43685","name":"Kwan-Yee K. Wong","hidden":false}],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-01T00:00:00.000Z","title":"DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory","submittedOnDailyBy":{"_id":"661a59ff8858a270e6ad4481","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/3XZ0X-7HCjaw_0PpYM3Pz.png","isPro":false,"fullname":"Zhenhao Yang","user":"JeffreyYzh","type":"user","name":"JeffreyYzh"},"summary":"Recent advances in video generative models have promoted rapid progress in controllable world models. However, maintaining fine-grained spatio-temporal consistency under long-horizon reasoning remains a key challenge. In this work, we move beyond explicit 3D memory and coarse frame-level implicit modeling, and propose a fine-grained, learnable, and scalable memory for consistent world generation. We first identify two fundamental limitations of naïve learnable memory architectures in long-horizon extrapolation, namely computational inefficiency and attention dispersion. Through a systematic analysis of attention dispersion, we propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation. Extensive experiments demonstrate that DecMem significantly outperforms current state-of-the-art methods. By ensuring precise and efficient long-term memory and achieving superior extrapolation capabilities, DecMem enables minute-level controllable long video generation with high fidelity and consistency.","upvotes":3,"discussionId":"6a1d297a808ddbc3c7d43686","projectPage":"https://jeffreyyzh.github.io/DecMem-Page/","githubRepo":"https://github.com/KlingAIResearch/DecMem","githubRepoAddedBy":"user","ai_summary":"A novel decoupled memory architecture called DecMem is introduced for consistent long-horizon video generation, addressing computational inefficiency and attention dispersion issues in learnable memory systems.","ai_keywords":["video generative models","world models","spatio-temporal consistency","long-horizon reasoning","learnable memory","attention dispersion","sparse global memory","anchored local memory","video generation","extrapolation"],"githubStars":3,"organization":{"_id":"662c559b322afcbae51b3c8b","name":"KlingTeam","fullname":"Kling Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/60e272ca6c78a8c122b12127/ZQV1aKLUDPf2rUcxxAqj6.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"661a59ff8858a270e6ad4481","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/3XZ0X-7HCjaw_0PpYM3Pz.png","isPro":false,"fullname":"Zhenhao Yang","user":"JeffreyYzh","type":"user"},{"_id":"662f93942510ef5735d7ad00","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/662f93942510ef5735d7ad00/ZIDIPm63sncIHFTT5b0uR.png","isPro":false,"fullname":"magicwpf","user":"magicwpf","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"662c559b322afcbae51b3c8b","name":"KlingTeam","fullname":"Kling Team","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/60e272ca6c78a8c122b12127/ZQV1aKLUDPf2rUcxxAqj6.jpeg"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.31336.md"}">
DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
Abstract
A novel decoupled memory architecture called DecMem is introduced for consistent long-horizon video generation, addressing computational inefficiency and attention dispersion issues in learnable memory systems.
AI-generated summary
Recent advances in video generative models have promoted rapid progress in controllable world models. However, maintaining fine-grained spatio-temporal consistency under long-horizon reasoning remains a key challenge. In this work, we move beyond explicit 3D memory and coarse frame-level implicit modeling, and propose a fine-grained, learnable, and scalable memory for consistent world generation. We first identify two fundamental limitations of naïve learnable memory architectures in long-horizon extrapolation, namely computational inefficiency and attention dispersion. Through a systematic analysis of attention dispersion, we propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation. Extensive experiments demonstrate that DecMem significantly outperforms current state-of-the-art methods. By ensuring precise and efficient long-term memory and achieving superior extrapolation capabilities, DecMem enables minute-level controllable long video generation with high fidelity and consistency.
Community
DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.31336 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.31336 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.