Hugging Face Daily Papers · June 2, 2026 · 3 min read

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

This paper presents a fine-tuning-based long-context compressor with a two-stage training recipe , which achieves strong long-context reasoning performance. A shorter version was accepted at the AdaptFM Workshop at ICML 2026.</p>\n","updatedAt":"2026-06-02T04:00:39.399Z","author":{"_id":"6813fbe428c1f6200bea5685","avatarUrl":"/avatars/60d3c8dfb85cad54c5ebc9f6e44100db.svg","fullname":"mengmeng ji","name":"mengmengj","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9423032999038696},"editors":["mengmengj"],"editorAvatarUrls":["/avatars/60d3c8dfb85cad54c5ebc9f6e44100db.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.01336","authors":[{"_id":"6a1e52fd808ddbc3c7d43d4f","name":"Mengmeng Ji","hidden":false},{"_id":"6a1e52fd808ddbc3c7d43d50","name":"Ravi Shanker Raju","hidden":false},{"_id":"6a1e52fd808ddbc3c7d43d51","name":"Jonathan Lingjie Li","hidden":false},{"_id":"6a1e52fd808ddbc3c7d43d52","name":"Chen Wu","hidden":false}],"publishedAt":"2026-05-31T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning","submittedOnDailyBy":{"_id":"6813fbe428c1f6200bea5685","avatarUrl":"/avatars/60d3c8dfb85cad54c5ebc9f6e44100db.svg","isPro":false,"fullname":"mengmeng ji","user":"mengmengj","type":"user","name":"mengmengj"},"summary":"As real-world applications increasingly require processing inputs of 100k+ tokens, the gap between context length and inference efficiency has become a critical bottleneck. Context compression offers a way to reduce prefill costs while preserving task accuracy. However, existing training-free attention-based methods leave substantial gaps in demanding long-context tasks such as code reasoning. We present LongAttnComp, a long-context adaptation of AttnComp that fine-tunes a lightweight cross-attention scoring layer and introduces tokenlevel chunking, a token-budget top-p algorithm, positional reordering, and a formatagnostic query parser. We further design a two-stage fine-tuning recipe for the compressor: Stage 1 builds a general retrieval foundation from NIAH-style data, and Stage 2 extends it with multi-hop and reasoning data for broader long-context task coverage. On InfiniteBench Code-Debug, LongAttnComp matches or exceeds full-context accuracy, substantially outperforms training-free baselines, and transfers across four target models from three families. On LongBench v2, the two-stage recipe largely closes the Stage 1 gap on multi-document reasoning while preserving Code-Debug performance.","upvotes":2,"discussionId":"6a1e52fd808ddbc3c7d43d53","ai_summary":"LongAttnComp adapts AttnComp for long-context processing by fine-tuning lightweight attention layers and implementing token-level chunking and positional reordering techniques.","ai_keywords":["attention-based methods","context compression","prefill costs","token-level chunking","top-p algorithm","positional reordering","query parser","two-stage fine-tuning","retrieval foundation","multi-hop reasoning","long-context tasks"],"organization":{"_id":"645b3d8d87c79b6ec0b959eb","name":"sambanovasystems","fullname":"SambaNova","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d14661053a863f53737f40/MQKwU5Fmv64m1XM38Y7R9.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6a15e80d9dc90b65ed374ac7","avatarUrl":"/avatars/2ede63c0ccfcfb575bb39e20774a7e45.svg","isPro":false,"fullname":"Лебедев Татьяна","user":"evelyncj9o","type":"user"},{"_id":"69ccf213409c56b8cbf25069","avatarUrl":"/avatars/a89d549641389e085dc4376bd052692d.svg","isPro":false,"fullname":"Григорьев Полина","user":"charlottesanc","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"645b3d8d87c79b6ec0b959eb","name":"sambanovasystems","fullname":"SambaNova","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/65d14661053a863f53737f40/MQKwU5Fmv64m1XM38Y7R9.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.01336.md"}">

Papers

arxiv:2606.01336

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

Published on May 31

· Submitted by

mengmeng ji on Jun 2

SambaNova

Upvote

Authors:

Abstract

LongAttnComp adapts AttnComp for long-context processing by fine-tuning lightweight attention layers and implementing token-level chunking and positional reordering techniques.

AI-generated summary

As real-world applications increasingly require processing inputs of 100k+ tokens, the gap between context length and inference efficiency has become a critical bottleneck. Context compression offers a way to reduce prefill costs while preserving task accuracy. However, existing training-free attention-based methods leave substantial gaps in demanding long-context tasks such as code reasoning. We present LongAttnComp, a long-context adaptation of AttnComp that fine-tunes a lightweight cross-attention scoring layer and introduces tokenlevel chunking, a token-budget top-p algorithm, positional reordering, and a formatagnostic query parser. We further design a two-stage fine-tuning recipe for the compressor: Stage 1 builds a general retrieval foundation from NIAH-style data, and Stage 2 extends it with multi-hop and reasoning data for broader long-context task coverage. On InfiniteBench Code-Debug, LongAttnComp matches or exceeds full-context accuracy, substantially outperforms training-free baselines, and transfers across four target models from three families. On LongBench v2, the two-stage recipe largely closes the Stage 1 gap on multi-document reasoning while preserving Code-Debug performance.

View arXiv page View PDF Add to collection

Community

mengmengj

Paper submitter about 6 hours ago

•

edited about 6 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.01336

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.01336 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.01336 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.01336 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers