arXiv — NLP / Computation & Language · May 28, 2026 · 3 min read

Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?

Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.

Like Read original ↗

Computer Science > Computation and Language

arXiv:2605.27881 (cs)

[Submitted on 27 May 2026]

Title:Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?

Authors:Yibo Zhao, Zichen Ding, Jiayi Wu, Zun Wang, Xiang Li

View a PDF of the paper titled Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?, by Yibo Zhao and 4 other authors

View PDF HTML (experimental)

Abstract:Search agents powered by large language models can autonomously decompose queries, retrieve information, and synthesize answers through multi-step reasoning. However, the rapid growth of training methods has outpaced controlled comparison: existing works differ in retrieval corpora, reward designs, and training protocols, making it unclear what actually drives improvements. We present a controlled empirical study that isolates three under-explored dimensions of search agent training. First, we identify a critical data-coverage issue in the widely used Wikipedia 2018 corpus and show that correcting it alone yields larger gains than the differences between training algorithms. Second, we systematically compare outcome-based and process-based reward methods across three base models, finding that the simplest outcome-based approach achieves competitive or superior performance in most settings, and that process-level credit assignment can over-correct agent behavior. Third, we analyze training data diversity, off-policy data utilization, and search budget scaling, distilling practical guidelines for training effective search agents. Our code is available at this https URL.

Comments:	18pages, 4 figures, and 15 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2605.27881 [cs.CL]
	(or arXiv:2605.27881v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.27881 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: YiBo Zhao [view email]
[v1] Wed, 27 May 2026 03:04:36 UTC (310 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?, by Yibo Zhao and 4 other authors

View PDF
HTML (experimental)
TeX Source

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2026-05

Change to browse by:

References & Citations

BibTeX formatted citation

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author
Venue
Institution
Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Discussion (0)

No comments yet. Sign in and be the first to say something.