Hugging Face Daily Papers · June 2, 2026 · 5 min read

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

We present a systematic study on observation masking—a lightweight context management (CM) technique for long-horizon search agents. By evaluation over diverse backbones (4B–284B) and retrievers, we establish a quantitative regime map revealing that CM gains follow an asymmetric inverted-U shape governed by the mismatch between retriever recall and a model's implicit filtering capacity. While masking provides boosts in intermediate regimes by removing a \"neglected middle noise\" that models fail to filter, its utility collapses once advanced models become saturated. Attention and behavioral tracking show that masking forces weaker models to structurally align with stronger ones, but risks evicting critical signals in capable readers. Our findings reframe context management as a regime-dependent intervention and suggest shifting future efforts from aggressive pruning toward high-fidelity retrieval.</p>\n","updatedAt":"2026-06-02T02:53:00.483Z","author":{"_id":"64913f1b24d9bc9bb8ff407e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64913f1b24d9bc9bb8ff407e/N1cdMd9_DJb5GymdKJ3Mb.jpeg","fullname":"Haoxiang Zhang","name":"IPF","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9013717770576477},"editors":["IPF"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/64913f1b24d9bc9bb8ff407e/N1cdMd9_DJb5GymdKJ3Mb.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.00408","authors":[{"_id":"6a1e4416808ddbc3c7d43c53","name":"Haoxiang Zhang","hidden":false},{"_id":"6a1e4416808ddbc3c7d43c54","name":"Qixin Xu","hidden":false},{"_id":"6a1e4416808ddbc3c7d43c55","name":"Zhuofeng Li","hidden":false},{"_id":"6a1e4416808ddbc3c7d43c56","name":"Lei Zhang","hidden":false},{"_id":"6a1e4416808ddbc3c7d43c57","name":"Pengcheng Jiang","hidden":false},{"_id":"6a1e4416808ddbc3c7d43c58","name":"Yu Zhang","hidden":false},{"_id":"6a1e4416808ddbc3c7d43c59","name":"Julian McAuley","hidden":false}],"publishedAt":"2026-05-29T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism","submittedOnDailyBy":{"_id":"64913f1b24d9bc9bb8ff407e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64913f1b24d9bc9bb8ff407e/N1cdMd9_DJb5GymdKJ3Mb.jpeg","isPro":false,"fullname":"Haoxiang Zhang","user":"IPF","type":"user","name":"IPF"},"summary":"Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps and why. We study observation masking through a systematic sweep over various agent backbones (4B to 284B parameters) and three retrievers on offline and live-web agentic search benchmarks. We find that the accuracy gain from masking follows an asymmetric inverted-U shape when plotted against the model's accuracy without context management: a plateau under weak retrievers, a peak when a strong retriever meets a mid-capacity model, and a sharp collapse when the model is saturated. This pattern reflects the interaction between retriever recall and the model's implicit filtering capacity, rather than either factor in isolation. Mechanistically, masking implements a token-for-turn trade-off: it removes observations the model has largely stopped attending to and pages the agent rarely re-opens. The added turns help when they convert failures into successes, but they fail when masking removes evidence the model would otherwise have used. We therefore reframe context management as a regime-dependent intervention and provide a holistic perspective for analyzing context use in agentic deep search. We release our scaffold and trajectories here (https://github.com/i-DeepSearch/observation-masking) to support future research.","upvotes":6,"discussionId":"6a1e4416808ddbc3c7d43c5a","githubRepo":"https://github.com/i-DeepSearch/observation-masking","githubRepoAddedBy":"user","ai_summary":"Observation masking in long-horizon search agents shows variable accuracy gains depending on the interaction between retriever capability and model capacity, following an asymmetric inverted-U pattern.","ai_keywords":["observation masking","context management","agent backbones","retrievers","agentic search","token-for-turn trade-off","implicit filtering capacity","retrieval recall"],"githubStars":0,"organization":{"_id":"65af44a1637e10fba942ed0c","name":"McAuley-Lab","fullname":"McAuley-Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64daab70c38427829daf5958/OWlh6vciWnY_MeyM099wZ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64913f1b24d9bc9bb8ff407e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64913f1b24d9bc9bb8ff407e/N1cdMd9_DJb5GymdKJ3Mb.jpeg","isPro":false,"fullname":"Haoxiang Zhang","user":"IPF","type":"user"},{"_id":"6343c91de7fd9848ef9caec1","avatarUrl":"/avatars/41a73940c804296734a2954cd9c71fc8.svg","isPro":true,"fullname":"Yiwei Yang","user":"yanyiwei","type":"user"},{"_id":"68e74e59bd5445ec2876ff24","avatarUrl":"/avatars/9288bf5af2553aa5dd3336cb2b91d2ab.svg","isPro":false,"fullname":"Zlh","user":"3mh","type":"user"},{"_id":"687db019eefd74aa2a85d902","avatarUrl":"/avatars/c0dfd184a8034583f35b84e0664660b6.svg","isPro":false,"fullname":"llc","user":"youdianxihuanmaomao","type":"user"},{"_id":"6a1e93c33080afe6545e1d6c","avatarUrl":"/avatars/7e9211477c009fb0e4a2308eee3bc414.svg","isPro":false,"fullname":"Zhiyuan Zhou","user":"lemonaddde","type":"user"},{"_id":"687363d49a81c7dcbcfa2d84","avatarUrl":"/avatars/5d943a5c811ed931c3fdcfee19253049.svg","isPro":false,"fullname":"jj","user":"realman123","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65af44a1637e10fba942ed0c","name":"McAuley-Lab","fullname":"McAuley-Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64daab70c38427829daf5958/OWlh6vciWnY_MeyM099wZ.png"}}">

Papers

arxiv:2606.00408

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Published on May 29

· Submitted by

Haoxiang Zhang on Jun 2

McAuley-Lab

Upvote

Authors:

Abstract

Observation masking in long-horizon search agents shows variable accuracy gains depending on the interaction between retriever capability and model capacity, following an asymmetric inverted-U pattern.

AI-generated summary

Long-horizon search agents accumulate large amounts of retrieved content across many tool calls, making context-budget efficiency increasingly important. A minimal intervention is to mask stale observations from the context as the trajectory progresses, but it remains unclear when this form of context management helps and why. We study observation masking through a systematic sweep over various agent backbones (4B to 284B parameters) and three retrievers on offline and live-web agentic search benchmarks. We find that the accuracy gain from masking follows an asymmetric inverted-U shape when plotted against the model's accuracy without context management: a plateau under weak retrievers, a peak when a strong retriever meets a mid-capacity model, and a sharp collapse when the model is saturated. This pattern reflects the interaction between retriever recall and the model's implicit filtering capacity, rather than either factor in isolation. Mechanistically, masking implements a token-for-turn trade-off: it removes observations the model has largely stopped attending to and pages the agent rarely re-opens. The added turns help when they convert failures into successes, but they fail when masking removes evidence the model would otherwise have used. We therefore reframe context management as a regime-dependent intervention and provide a holistic perspective for analyzing context use in agentic deep search. We release our scaffold and trajectories here (https://github.com/i-DeepSearch/observation-masking) to support future research.

View arXiv page View PDF GitHub 0 Add to collection

Community

IPF

Paper submitter about 7 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.00408 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.00408 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Abstract

Community

Models citing this paper 0

Datasets citing this paper 1

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers