Hugging Face Daily Papers · · 5 min read

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe \\textbf{over-search}, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy.</p>\n","updatedAt":"2026-06-01T09:43:09.777Z","author":{"_id":"64c1e61f033ff1877a1c8ef2","avatarUrl":"/avatars/47ffdbce69cf0be1bbe001afa424af64.svg","fullname":"LiuShiyu","name":"liussy","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8961887359619141},"editors":["liussy"],"editorAvatarUrls":["/avatars/47ffdbce69cf0be1bbe001afa424af64.svg"],"reactions":[{"reaction":"🔥","users":["Qing145"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29796","authors":[{"_id":"6a1bbafb808ddbc3c7d43163","user":{"_id":"6a1d4ae0e587e00929561155","avatarUrl":"/avatars/61111db08e710af83f96772f1493dbe9.svg","isPro":false,"fullname":"tangyunbo","user":"tangyunbo","type":"user","name":"tangyunbo"},"name":"Yunbo Tang","status":"claimed_verified","statusLastChangedAt":"2026-06-01T09:35:00.680Z","hidden":false},{"_id":"6a1bbafb808ddbc3c7d43164","name":"Chengyi Yang","hidden":false},{"_id":"6a1bbafb808ddbc3c7d43165","user":{"_id":"64c1e61f033ff1877a1c8ef2","avatarUrl":"/avatars/47ffdbce69cf0be1bbe001afa424af64.svg","isPro":false,"fullname":"LiuShiyu","user":"liussy","type":"user","name":"liussy"},"name":"Shiyu Liu","status":"claimed_verified","statusLastChangedAt":"2026-06-01T09:35:16.038Z","hidden":false},{"_id":"6a1bbafb808ddbc3c7d43166","name":"Zhishang Xiang","hidden":false},{"_id":"6a1bbafb808ddbc3c7d43167","name":"Zerui Chen","hidden":false},{"_id":"6a1bbafb808ddbc3c7d43168","name":"Qinggang Zhang","hidden":false},{"_id":"6a1bbafb808ddbc3c7d43169","name":"Jinsong Su","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-06-01T00:00:00.000Z","title":"SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search","submittedOnDailyBy":{"_id":"64c1e61f033ff1877a1c8ef2","avatarUrl":"/avatars/47ffdbce69cf0be1bbe001afa424af64.svg","isPro":false,"fullname":"LiuShiyu","user":"liussy","type":"user","name":"liussy"},"summary":"Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe over-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code is anonymously released at https://github.com/XMUDeepLIT/SAAS.","upvotes":12,"discussionId":"6a1bbafb808ddbc3c7d4316a","githubRepo":"https://github.com/XMUDeepLIT/SAAS","githubRepoAddedBy":"user","ai_summary":"SAAS introduces a reinforcement learning framework that enhances agent self-awareness to reduce unnecessary searches in LLM-based question answering systems.","ai_keywords":["agentic search","LLMs","multi-hop questions","iterative reasoning","external search","self-awareness","over-search","RL framework","search boundary modeling","boundary-aware reward module","stage-wise optimization","trajectory-level penalties","reward hacking"],"githubStars":5,"organization":{"_id":"63874bf02ee2261893da45d2","name":"XMU","fullname":"Xiamen University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1669811178924-6387486185f406f24f5426c1.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64c1e61f033ff1877a1c8ef2","avatarUrl":"/avatars/47ffdbce69cf0be1bbe001afa424af64.svg","isPro":false,"fullname":"LiuShiyu","user":"liussy","type":"user"},{"_id":"67d961707be1fc0249b25fdc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/FonjaPtY_SQHWDrVK-8yI.png","isPro":false,"fullname":"polaristtang","user":"polarisst","type":"user"},{"_id":"6a1d4ae0e587e00929561155","avatarUrl":"/avatars/61111db08e710af83f96772f1493dbe9.svg","isPro":false,"fullname":"tangyunbo","user":"tangyunbo","type":"user"},{"_id":"665a85bfaec6a7806386bea5","avatarUrl":"/avatars/31a19ec2f93a27e70dc5103356f895d3.svg","isPro":false,"fullname":"lmx","user":"meixiu","type":"user"},{"_id":"653f1d243bd61358055ad51d","avatarUrl":"/avatars/698c03b9a4bb69659d2ed594626e3895.svg","isPro":false,"fullname":"junmingyang","user":"jmyang","type":"user"},{"_id":"641a6a7e19fc5647be190d12","avatarUrl":"/avatars/0a28baa45b6d084ade4e1554ff720a9a.svg","isPro":false,"fullname":"Tanzhehao","user":"Picaa","type":"user"},{"_id":"6852e7443025dde58d881cdc","avatarUrl":"/avatars/e5c4656272128d6702290b5754812d36.svg","isPro":false,"fullname":"Zhang","user":"Qing145","type":"user"},{"_id":"681ab9d3d7dbd87287875667","avatarUrl":"/avatars/1a2785d7a250c4988b1c1c5cc78e53fc.svg","isPro":false,"fullname":"ChengyiYang","user":"ChengyiYang","type":"user"},{"_id":"691d7d78b957ac2edce4a82c","avatarUrl":"/avatars/b34eb01b2f1acd4e9beb6a04ce09cc54.svg","isPro":false,"fullname":"刘钊旭","user":"liuzx06","type":"user"},{"_id":"67b402a19b073a0ea8ad4a00","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/sESwpARif1ptzXSoovR2h.jpeg","isPro":false,"fullname":"TMSDMAP","user":"TMSDMAP","type":"user"},{"_id":"6a1d5e5c2bd544fd8ec67a97","avatarUrl":"/avatars/6d8e03e280efc5e8f1c4c734515b9b40.svg","isPro":false,"fullname":"xiang luo","user":"luxe170","type":"user"},{"_id":"68b3cae35ceb7eb99e185690","avatarUrl":"/avatars/41369ff107209e18920226652a7ee951.svg","isPro":false,"fullname":"ningruqing","user":"ruqing07","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63874bf02ee2261893da45d2","name":"XMU","fullname":"Xiamen University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1669811178924-6387486185f406f24f5426c1.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29796.md"}">
Papers
arxiv:2605.29796

SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

Published on May 28
· Submitted by
LiuShiyu
on Jun 1
Authors:
,
,
,
,

Abstract

SAAS introduces a reinforcement learning framework that enhances agent self-awareness to reduce unnecessary searches in LLM-based question answering systems.

AI-generated summary

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe over-search, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy. Our code is anonymously released at https://github.com/XMUDeepLIT/SAAS.

Community

Paper author Paper submitter about 1 hour ago

Agentic search enables LLMs to solve complex multi-hop questions through iterative reasoning and external search. Despite the effectiveness, these systems often suffer from a critical limitation in practice: agents fail to recognize their own knowledge boundaries, blindly triggering searches when internal knowledge suffices and failing to terminate search even when adequate evidence has been collected. The lack of self-awareness leads to severe \textbf{over-search}, incurring substantial inference latency and prohibitive computational cost. To this end, we propose SAAS, a novel RL framework designed to cultivate dynamic self-awareness that precisely regulates search behavior without compromising accuracy. SAAS introduces three key components: (i) a search boundary modeling mechanism, which identifies the search boundary under the evolving policy by contrasting search-disabled and search-enabled rollouts; (ii) a boundary-aware reward module, which translates this boundary awareness into trajectory-level penalties, suppressing unnecessary and redundant searches; and (iii) a stage-wise optimization strategy, which leverages a sequential curriculum to prioritize reasoning over search regularization, thereby avoiding reward hacking. Extensive experiments demonstrate that SAAS substantially reduces over-search, while maintaining accuracy.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.29796
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.29796 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.29796 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.29796 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers