Hugging Face Daily Papers · · 3 min read

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

slim searcher for search agent efficiency</p>\n","updatedAt":"2026-06-09T09:22:46.439Z","author":{"_id":"63f87b14b0ae1748524a8f50","avatarUrl":"/avatars/e6543d75d115bd34edbd80f322457b75.svg","fullname":"dan","name":"prayerdan","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":7,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6966947317123413},"editors":["prayerdan"],"editorAvatarUrls":["/avatars/e6543d75d115bd34edbd80f322457b75.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.07074","authors":[{"_id":"6a2783f06dde1c5ef75bcf94","name":"Zequn Xie","hidden":false},{"_id":"6a2783f06dde1c5ef75bcf95","name":"Junjie Wang","hidden":false},{"_id":"6a2783f06dde1c5ef75bcf96","name":"Dan Yang","hidden":false},{"_id":"6a2783f06dde1c5ef75bcf97","name":"Jie Feng","hidden":false},{"_id":"6a2783f06dde1c5ef75bcf98","name":"Yue Shen","hidden":false},{"_id":"6a2783f06dde1c5ef75bcf99","name":"Jian Wang","hidden":false},{"_id":"6a2783f06dde1c5ef75bcf9a","name":"Jinjie Gu","hidden":false}],"publishedAt":"2026-06-05T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating","submittedOnDailyBy":{"_id":"63f87b14b0ae1748524a8f50","avatarUrl":"/avatars/e6543d75d115bd34edbd80f322457b75.svg","isPro":false,"fullname":"dan","user":"prayerdan","type":"user","name":"prayerdan"},"summary":"Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating long, redundant trajectories that are far from necessary for resolving these tasks, leading to wasteful tool calls and excessive token consumption. To overcome this efficiency trap, we propose SlimSearcher, a principled framework that pushes the Pareto frontier between accuracy and computational cost across both Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). In the SFT stage, SlimSearcher employs Pareto-efficient filtration to distill trajectories that are both successful and economical, guiding the model toward inherently efficiency-aware search behaviors. During RL, we introduce Adaptive Reward Gating, a dynamic reward-shaping mechanism that evaluates relative tool and token efficiency within a sampled cohort. By cascading these adaptive efficiency metrics with a strict correctness gate, our approach effectively avoids the brevity bias associated with absolute penalties and mitigates reward hacking. Extensive experiments on long-horizon benchmarks, including GAIA, BrowseComp, and XBenchDeepSearch, demonstrate that SlimSearcher reduces average tool-call rounds by 17%-58% while maintaining or improving accuracy.","upvotes":8,"discussionId":"6a2783f16dde1c5ef75bcf9b","ai_summary":"SlimSearcher is a framework that improves efficiency in deep research agents by combining Pareto-efficient trajectory filtering and adaptive reward shaping to reduce computational costs while maintaining accuracy.","ai_keywords":["Supervised Fine-Tuning","Reinforcement Learning","Pareto-efficient filtration","adaptive reward gating","reward-shaping mechanism","tool-call rounds","accuracy-focused training paradigms","brute-force strategies","performative reasoning","trajectory optimization"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct"},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66cc4ce3d3c1ab9a0074d4d9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66cc4ce3d3c1ab9a0074d4d9/vz3DN9CA97odQThgOHa6y.png","isPro":false,"fullname":"Zequn Xie","user":"fmyaidha","type":"user"},{"_id":"63f87b14b0ae1748524a8f50","avatarUrl":"/avatars/e6543d75d115bd34edbd80f322457b75.svg","isPro":false,"fullname":"dan","user":"prayerdan","type":"user"},{"_id":"649d19084113b5283b3df807","avatarUrl":"/avatars/00d61975c50a2f5c8ab395b8749aa638.svg","isPro":false,"fullname":"Junjie Wang","user":"WJJ-ZJU","type":"user"},{"_id":"665a85bfaec6a7806386bea5","avatarUrl":"/avatars/31a19ec2f93a27e70dc5103356f895d3.svg","isPro":false,"fullname":"lmx","user":"meixiu","type":"user"},{"_id":"64c1e61f033ff1877a1c8ef2","avatarUrl":"/avatars/47ffdbce69cf0be1bbe001afa424af64.svg","isPro":false,"fullname":"LiuShiyu","user":"liussy","type":"user"},{"_id":"653f1d243bd61358055ad51d","avatarUrl":"/avatars/698c03b9a4bb69659d2ed594626e3895.svg","isPro":false,"fullname":"junmingyang","user":"jmyang","type":"user"},{"_id":"641a6a7e19fc5647be190d12","avatarUrl":"/avatars/0a28baa45b6d084ade4e1554ff720a9a.svg","isPro":false,"fullname":"Tanzhehao","user":"Picaa","type":"user"},{"_id":"68c7ae926451da6be3f841ff","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/YFnpPhBE3EuBaqjTtqO-d.png","isPro":false,"fullname":"lianqian","user":"lianqian","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.07074.md"}">
Papers
arxiv:2606.07074

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating

Published on Jun 5
· Submitted by
dan
on Jun 9
Authors:
,
,
,
,
,
,

Abstract

SlimSearcher is a framework that improves efficiency in deep research agents by combining Pareto-efficient trajectory filtering and adaptive reward shaping to reduce computational costs while maintaining accuracy.

Deep research agents have demonstrated remarkable capabilities in complex information-seeking tasks, yet this power comes at a steep computational cost. Driven by accuracy-focused training paradigms, current models adopt brute-force strategies characterized by blind tool dependency and performative reasoning-generating long, redundant trajectories that are far from necessary for resolving these tasks, leading to wasteful tool calls and excessive token consumption. To overcome this efficiency trap, we propose SlimSearcher, a principled framework that pushes the Pareto frontier between accuracy and computational cost across both Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). In the SFT stage, SlimSearcher employs Pareto-efficient filtration to distill trajectories that are both successful and economical, guiding the model toward inherently efficiency-aware search behaviors. During RL, we introduce Adaptive Reward Gating, a dynamic reward-shaping mechanism that evaluates relative tool and token efficiency within a sampled cohort. By cascading these adaptive efficiency metrics with a strict correctness gate, our approach effectively avoids the brevity bias associated with absolute penalties and mitigates reward hacking. Extensive experiments on long-horizon benchmarks, including GAIA, BrowseComp, and XBenchDeepSearch, demonstrate that SlimSearcher reduces average tool-call rounds by 17%-58% while maintaining or improving accuracy.

Community

Paper submitter about 10 hours ago

slim searcher for search agent efficiency

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.07074
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.07074 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.07074 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.07074 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers