Hugging Face Daily Papers · · 7 min read

ACC: Compiling Agent Trajectories for Long-Context Training

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization. Dataset and checkpoints are released publicly.</p>\n","updatedAt":"2026-05-22T01:35:12.915Z","author":{"_id":"670a3bc3ada59c956f18cc17","avatarUrl":"/avatars/1a853deca3ec206f47aec2c6abe3a146.svg","fullname":"SII-sqs","name":"groundhogLLM","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.882133424282074},"editors":["groundhogLLM"],"editorAvatarUrls":["/avatars/1a853deca3ec206f47aec2c6abe3a146.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.21850","authors":[{"_id":"6a0fb25da53a61ce2e422bd2","name":"Qisheng Su","hidden":false},{"_id":"6a0fb25da53a61ce2e422bd3","name":"Zhen Fang","hidden":false},{"_id":"6a0fb25da53a61ce2e422bd4","name":"Shiting Huang","hidden":false},{"_id":"6a0fb25da53a61ce2e422bd5","name":"Yu Zeng","hidden":false},{"_id":"6a0fb25da53a61ce2e422bd6","name":"Yiming Zhao","hidden":false},{"_id":"6a0fb25da53a61ce2e422bd7","name":"Kou Shi","hidden":false},{"_id":"6a0fb25da53a61ce2e422bd8","name":"Ziao Zhang","hidden":false},{"_id":"6a0fb25da53a61ce2e422bd9","name":"Lin Chen","hidden":false},{"_id":"6a0fb25da53a61ce2e422bda","name":"Zehui Chen","hidden":false},{"_id":"6a0fb25da53a61ce2e422bdb","name":"Lijun Wu","hidden":false},{"_id":"6a0fb25da53a61ce2e422bdc","name":"Feng Zhao","hidden":false}],"publishedAt":"2026-05-21T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"ACC: Compiling Agent Trajectories for Long-Context Training","submittedOnDailyBy":{"_id":"670a3bc3ada59c956f18cc17","avatarUrl":"/avatars/1a853deca3ec206f47aec2c6abe3a146.svg","isPro":false,"fullname":"SII-sqs","user":"groundhogLLM","type":"user","name":"groundhogLLM"},"summary":"Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization.","upvotes":48,"discussionId":"6a0fb25da53a61ce2e422bdd","ai_summary":"Agent Context Compilation (ACC) enhances long-context reasoning in LLMs by converting multi-turn agent trajectories into structured QA pairs, enabling direct supervision of distant context integration without additional annotation.","ai_keywords":["long-context reasoning","agent SFT","tool responses","environment observations","trajectory conversion","long-range dependency modeling","cross-turn coreference resolution","graph traversal","supervised fine-tuning","attention restructuring","expert specialization"],"organization":{"_id":"67ff908ff0f413c693b7cd0c","name":"ustc-community","fullname":"University of Science and Technology of China","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/660c2d134ba2fcc848b03e21/j_f3uYYIFPH_4WJH9fKel.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69414b41d151400e5ba09c9f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2d7XvG3s1bLUf8_VT2Smj.jpeg","isPro":false,"fullname":"Guangwei Zhang","user":"Changhu1933","type":"user"},{"_id":"670a3bc3ada59c956f18cc17","avatarUrl":"/avatars/1a853deca3ec206f47aec2c6abe3a146.svg","isPro":false,"fullname":"SII-sqs","user":"groundhogLLM","type":"user"},{"_id":"673d858d7c9e8932533f60d6","avatarUrl":"/avatars/2182565a6de7e1ab2dc3e2a94e84710f.svg","isPro":false,"fullname":"dingo","user":"dingo114514","type":"user"},{"_id":"660bf6580ba3fbba79217835","avatarUrl":"/avatars/a6d1ab05842cab94a192abdc2a3c252e.svg","isPro":false,"fullname":"SII-Tianyu Huai","user":"Masteryth","type":"user"},{"_id":"67d7dadc64fc63993a7fbbfc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67d7dadc64fc63993a7fbbfc/5CXBMMWBGGtSimEFm6Aez.jpeg","isPro":false,"fullname":"Yutao LING","user":"wagon196","type":"user"},{"_id":"67d13fcd8da6af884076781c","avatarUrl":"/avatars/beb7dc58522a3f915b907c6e109e8374.svg","isPro":false,"fullname":"XinyuanXia","user":"MinstrelsyXia","type":"user"},{"_id":"680f4533f3650eda9a1a12b9","avatarUrl":"/avatars/c8a1e5da0d051625fe64643347dd2888.svg","isPro":false,"fullname":"li","user":"runanli5522","type":"user"},{"_id":"64b02ec0e5000ae8a572ced5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b02ec0e5000ae8a572ced5/6ifLntBU2ICQK7SW8WxKU.png","isPro":false,"fullname":"Lin Chen","user":"Lin-Chen","type":"user"},{"_id":"66ae3fbf491b555fef3bac0c","avatarUrl":"/avatars/47353470d46097ce108d32792dbbf2a2.svg","isPro":false,"fullname":"Shiting Huang","user":"chocckaka","type":"user"},{"_id":"654c3a4009dd7ef52491c080","avatarUrl":"/avatars/c3c14a5e732f7034eb5c50a4a8a47107.svg","isPro":false,"fullname":"Wenjun Feng","user":"USTCKevinF","type":"user"},{"_id":"665d652e0f35c005de892108","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/665d652e0f35c005de892108/OGLbgZekX-3XTBkwS8k86.jpeg","isPro":false,"fullname":"Yu Zeng","user":"YuZeng260","type":"user"},{"_id":"68cea2433fa41b4843d7d82c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68cea2433fa41b4843d7d82c/qltWIciftHkiUmzGkwVA-.jpeg","isPro":false,"fullname":"Wenyang Li(SII)","user":"SII-LWY","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67ff908ff0f413c693b7cd0c","name":"ustc-community","fullname":"University of Science and Technology of China","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/660c2d134ba2fcc848b03e21/j_f3uYYIFPH_4WJH9fKel.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.21850.md"}">
Papers
arxiv:2605.21850

ACC: Compiling Agent Trajectories for Long-Context Training

Published on May 21
· Submitted by
SII-sqs
on May 22
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

Agent Context Compilation (ACC) enhances long-context reasoning in LLMs by converting multi-turn agent trajectories into structured QA pairs, enabling direct supervision of distant context integration without additional annotation.

AI-generated summary

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization.

Community

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization. Dataset and checkpoints are released publicly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.21850
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.21850 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.21850 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.21850 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers