Hugging Face Daily Papers · June 10, 2026 · 3 min read

Dynamic Linear Attention

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

DLA (Dynamic Linear Attention) is a framework for multi-state linear attention that uses information-aware dynamic state merging and capacity-bounded memory modeling to improve long-context LLM scalability.</p>\n","updatedAt":"2026-06-10T02:17:57.548Z","author":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","fullname":"taesiri","name":"taesiri","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":314,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8791528344154358},"editors":["taesiri"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.10650","authors":[{"_id":"6a28c949e7d78ea7587e53bc","name":"Xin Wang","hidden":false},{"_id":"6a28c949e7d78ea7587e53bd","name":"Hui Shen","hidden":false},{"_id":"6a28c949e7d78ea7587e53be","name":"Boyuan Zheng","hidden":false},{"_id":"6a28c949e7d78ea7587e53bf","name":"Xueshen Liu","hidden":false},{"_id":"6a28c949e7d78ea7587e53c0","name":"Minkyoung Cho","hidden":false},{"_id":"6a28c949e7d78ea7587e53c1","name":"Zhongwei Wan","hidden":false},{"_id":"6a28c949e7d78ea7587e53c2","name":"Zesen Zhao","hidden":false},{"_id":"6a28c949e7d78ea7587e53c3","name":"Zhuoqing Mao","hidden":false},{"_id":"6a28c949e7d78ea7587e53c4","name":"Shen Yan","hidden":false},{"_id":"6a28c949e7d78ea7587e53c5","name":"Mi Zhang","hidden":false}],"publishedAt":"2026-06-09T00:00:00.000Z","submittedOnDailyAt":"2026-06-10T00:00:00.000Z","title":"Dynamic Linear Attention","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user","name":"taesiri"},"summary":"The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the quadratic complexity of standard attention, motivating the adoption of linear attention mechanisms with sub-quadratic cost. To improve representation capacity under long contexts, recent approaches organize memory in a multi-state manner. However, existing multi-state linear attention methods rely on fixed state merging policies that cannot adapt to dynamically varying token importance, irreversibly obscuring critical tokens and causing severe error accumulation over long sequences. To address this limitation, we propose DLA, a dynamic memory modeling framework for multi-state linear attention. DLA introduces (i) Information-Aware Dynamic State Merging, which adaptively determines state boundaries based on token-level information variation, preserving high-resolution representations around semantic transitions while aggressively summarizing stable regions, and (ii) Capacity-Bounded Memory Modeling, which maintains a fixed-size, chronologically ordered state cache by selectively merging adjacent low-information states to control memory growth with minimal information loss. We pre-train DLA on two different linear attention models and evaluate on 16 datasets across three categories. Experimental results demonstrate the superiority of DLA over state-of-the-art.","upvotes":3,"discussionId":"6a28c949e7d78ea7587e53c6","ai_summary":"DLA addresses limitations in long-context LLMs by introducing adaptive state merging and capacity-bounded memory modeling for improved multi-state linear attention.","ai_keywords":["Large Language Models","standard attention","linear attention mechanisms","multi-state linear attention","token importance","information-aware dynamic state merging","capacity-bounded memory modeling","pre-training","experimental evaluation"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"67d1140985ea0644e2f14b99","name":"ByteDance-Seed","fullname":"ByteDance Seed","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6535c9e88bde2fae19b6fb25/flkDUqd_YEuFsjeNET3r-.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"64b2f97434a92b848c7e941e","avatarUrl":"/avatars/c699c50f3b43cd1641469521127753bb.svg","isPro":false,"fullname":"Nagori","user":"MohammedNaeem","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67d1140985ea0644e2f14b99","name":"ByteDance-Seed","fullname":"ByteDance Seed","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/6535c9e88bde2fae19b6fb25/flkDUqd_YEuFsjeNET3r-.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.10650.md"}">

Papers

arxiv:2606.10650

Dynamic Linear Attention

Published on Jun 9

· Submitted by

taesiri on Jun 10

ByteDance Seed

Upvote

Authors:

Abstract

DLA addresses limitations in long-context LLMs by introducing adaptive state merging and capacity-bounded memory modeling for improved multi-state linear attention.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the quadratic complexity of standard attention, motivating the adoption of linear attention mechanisms with sub-quadratic cost. To improve representation capacity under long contexts, recent approaches organize memory in a multi-state manner. However, existing multi-state linear attention methods rely on fixed state merging policies that cannot adapt to dynamically varying token importance, irreversibly obscuring critical tokens and causing severe error accumulation over long sequences. To address this limitation, we propose DLA, a dynamic memory modeling framework for multi-state linear attention. DLA introduces (i) Information-Aware Dynamic State Merging, which adaptively determines state boundaries based on token-level information variation, preserving high-resolution representations around semantic transitions while aggressively summarizing stable regions, and (ii) Capacity-Bounded Memory Modeling, which maintains a fixed-size, chronologically ordered state cache by selectively merging adjacent low-information states to control memory growth with minimal information loss. We pre-train DLA on two different linear attention models and evaluate on 16 datasets across three categories. Experimental results demonstrate the superiority of DLA over state-of-the-art.

View arXiv page View PDF Add to collection

Community

taesiri

Paper submitter about 15 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.10650

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.10650 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.10650 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.10650 in a Space README.md to link it from this page.

Collections including this paper 3

Discussion (0)

No comments yet. Sign in and be the first to say something.

Dynamic Linear Attention

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 3

Discussion (0)

More from Hugging Face Daily Papers