Hugging Face Daily Papers · · 4 min read

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

We introduce <strong>FadeMem</strong>, a distance-aware KV memory consolidation method for long autoregressive video diffusion.</p>\n<p>The core idea is simple: not all past frames should be treated equally. Recent frames are kept at higher resolution for short-term dynamics, while older history is progressively merged into compact long-range memory that preserves scene structure and identity. This gives a dense-near / sparse-far temporal memory under a fixed KV cache budget.</p>\n<p>FadeMem does not require architectural changes. In our experiments, it improves long-video consistency, background stability, and temporal coherence over existing bounded-cache strategies.</p>\n","updatedAt":"2026-06-10T11:49:17.982Z","author":{"_id":"639ae93c3786549794e97c69","avatarUrl":"/avatars/65f7c0b641145f68f22072a5f77e086d.svg","fullname":"YL","name":"Simase","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8325963616371155},"editors":["Simase"],"editorAvatarUrls":["/avatars/65f7c0b641145f68f22072a5f77e086d.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.10671","authors":[{"_id":"6a294e856f6d46eeb3eec45e","name":"Yu Lu","hidden":false},{"_id":"6a294e856f6d46eeb3eec45f","name":"Junjie Yang","hidden":false},{"_id":"6a294e856f6d46eeb3eec460","name":"Piotr Koniusz","hidden":false},{"_id":"6a294e856f6d46eeb3eec461","name":"YuXin Song","hidden":false},{"_id":"6a294e856f6d46eeb3eec462","name":"Yi Yang","hidden":false}],"publishedAt":"2026-06-09T00:00:00.000Z","submittedOnDailyAt":"2026-06-10T00:00:00.000Z","title":"FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion","submittedOnDailyBy":{"_id":"639ae93c3786549794e97c69","avatarUrl":"/avatars/65f7c0b641145f68f22072a5f77e086d.svg","isPro":false,"fullname":"YL","user":"Simase","type":"user","name":"Simase"},"summary":"Autoregressive video generators synthesize long videos by generating successive temporal segments, but their historical KV cache grows with video length. Existing bounded-cache methods reduce this cost with local windows, sink tokens, or compressed memory states, yet they usually assign fixed roles to different parts of the history. We propose FadeMem, a distance-aware KV memory consolidation mechanism that organizes historical KV blocks into a temporal hierarchy under a fixed cache budget. This design is motivated by frequency-dependent temporal decay: fine details decorrelate quickly, while coarse scene structure and identity remain useful over longer horizons. During generation, new history is inserted as fine-grained entries, while older adjacent entries are progressively merged under a power-law temporal allocation schedule, yielding a dense-near, sparse-far memory within one cache. Without architectural changes, FadeMem preserves recent context for short-term dynamics and compact long-range anchors for identity and scene coherence. Experiments show improved subject consistency, background stability, and temporal coherence over existing bounded-cache strategies.","upvotes":0,"discussionId":"6a294e856f6d46eeb3eec463","ai_summary":"FadeMem introduces a distance-aware key-value memory consolidation mechanism that organizes historical video data into a temporal hierarchy, improving long-video generation by preserving recent context and long-range anchors under fixed cache constraints.","ai_keywords":["KV cache","autoregressive video generators","temporal hierarchy","memory consolidation","temporal decay","power-law allocation","subject consistency","background stability","temporal coherence"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.10671.md"}">
Papers
arxiv:2606.10671

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

Published on Jun 9
· Submitted by
YL
on Jun 10
Authors:
,
,
,
,

Abstract

FadeMem introduces a distance-aware key-value memory consolidation mechanism that organizes historical video data into a temporal hierarchy, improving long-video generation by preserving recent context and long-range anchors under fixed cache constraints.

Autoregressive video generators synthesize long videos by generating successive temporal segments, but their historical KV cache grows with video length. Existing bounded-cache methods reduce this cost with local windows, sink tokens, or compressed memory states, yet they usually assign fixed roles to different parts of the history. We propose FadeMem, a distance-aware KV memory consolidation mechanism that organizes historical KV blocks into a temporal hierarchy under a fixed cache budget. This design is motivated by frequency-dependent temporal decay: fine details decorrelate quickly, while coarse scene structure and identity remain useful over longer horizons. During generation, new history is inserted as fine-grained entries, while older adjacent entries are progressively merged under a power-law temporal allocation schedule, yielding a dense-near, sparse-far memory within one cache. Without architectural changes, FadeMem preserves recent context for short-term dynamics and compact long-range anchors for identity and scene coherence. Experiments show improved subject consistency, background stability, and temporal coherence over existing bounded-cache strategies.

Community

Paper submitter about 5 hours ago

We introduce FadeMem, a distance-aware KV memory consolidation method for long autoregressive video diffusion.

The core idea is simple: not all past frames should be treated equally. Recent frames are kept at higher resolution for short-term dynamics, while older history is progressively merged into compact long-range memory that preserves scene structure and identity. This gives a dense-near / sparse-far temporal memory under a fixed KV cache budget.

FadeMem does not require architectural changes. In our experiments, it improves long-video consistency, background stability, and temporal coherence over existing bounded-cache strategies.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.10671
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.10671 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.10671 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.10671 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers