Hugging Face Daily Papers · June 8, 2026 · 7 min read

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

#model-release #agents #reasoning #funding

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

What's the real bottleneck in your search agent? Often it's the retriever, and you don't need to retrain your agent to fix it.\n🗞️ Most existing agentic search approaches (like Search-R1) primarily optimize the reasoning agent while treating the retrieval model as a frozen black-box component. This design implicitly assumes that a sufficiently capable reasoning model can compensate for retrieval failures through improved query reformulation alone. We challenge this assumption by arguing that sub-optimal retrieval can be a bottleneck in agentic search performance. There has been some attempt (Agentic-R, CoSearch) to address this issue by jointly optimizing retrievers and reasoning agents. In practice, however, these methods are difficult to apply in settings where the reasoning model cannot be further trained, the retriever is externally provided, or gold-passage supervision is unavailable.\n♦️ To address this, we propose Critic-R, a framework that closes the feedback loop between the reasoning agent and the retriever, at both inference and training time. Instead of blindly accepting whatever the retriever returns, Critic-R uses a separate critic model that reads the agent's introspective reasoning trace after it consumes the retrieved documents, and decides whether that evidence is actually sufficient to support the next reasoning step.\nThis verification signal powers two complementary mechanisms: 🔹 Critic-R-Zero (inference-time): when the critic finds the evidence insufficient, it rewrites the retrieval query and instruction based on reasoning agent's own introspective feedback and tries again, until the agent is satisfied or a refinement budget runs out. No gradient updates anywhere, the agent is untouched, and it works on top of any retriever, including those from Agentic-R or CoSearch. 🔹 Critic-Embed (training-time): to amortize the cost of refinement, we turn Critic-R-Zero's own trajectories into supervision. Documents that satisfy the agent become positives; documents rejected during failed refinement become hard intra-trajectory negatives. The retriever is fine-tuned with this signal, with no gold-passage annotations required.\nAcross HotpotQA, 2Wiki, MuSiQue, and Bamboogle: ✅ Critic-R-Zero has +12.4% relative improvement at inference time alone ✅ Critic-Embed gives +7.5% improvement when only the retriever is replaced, beating both off-the-shelf and co-trained retrievers\nOne interesting finding is that removing the agent's introspective feedback when collecting training data makes the retriever consistently worse. The agent's own sense of what's missing isn't a minor input to the critic, it's the primary supervisory signal Critic-Embed inherits.\nCheck out the paper for more details\n","updatedAt":"2026-06-08T04:50:08.780Z","author":{"_id":"63f5e70e4b831cc179b994a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1677059798232-noauth.jpeg","fullname":"Md Zarif Ul Alam","name":"zarif98sjs","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9053108096122742},"editors":["zarif98sjs"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1677059798232-noauth.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.00590","authors":[{"_id":"6a1ed2ea808ddbc3c7d440a7","user":{"_id":"63f5e70e4b831cc179b994a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1677059798232-noauth.jpeg","isPro":false,"fullname":"Md Zarif Ul Alam","user":"zarif98sjs","type":"user","name":"zarif98sjs"},"name":"Md Zarif Ul Alam","status":"claimed_verified","statusLastChangedAt":"2026-06-03T14:19:42.912Z","hidden":false},{"_id":"6a1ed2ea808ddbc3c7d440a8","name":"Alireza Salemi","hidden":false},{"_id":"6a1ed2ea808ddbc3c7d440a9","name":"Hamed Zamani","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/63f5e70e4b831cc179b994a7/xydqcEAiJVWMYoDTgNlXk.png"],"publishedAt":"2026-05-30T00:00:00.000Z","submittedOnDailyAt":"2026-06-08T00:00:00.000Z","title":"Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback","submittedOnDailyBy":{"_id":"63f5e70e4b831cc179b994a7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1677059798232-noauth.jpeg","isPro":false,"fullname":"Md Zarif Ul Alam","user":"zarif98sjs","type":"user","name":"zarif98sjs"},"summary":"Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.","upvotes":0,"discussionId":"6a1ed2ea808ddbc3c7d440aa","githubRepo":"https://github.com/zarif98sjs/Critic-R","githubRepoAddedBy":"user","ai_summary":"Critic-R framework enhances agentic search by closing the feedback loop between reasoning agents and retrieval models through critic evaluation and dual optimization mechanisms.","ai_keywords":["agentic search systems","retrieval models","reasoning agent","feedback loop","critic model","introspective reasoning trace","query refinement","retrieval optimization","automatic supervision"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":1,"organization":{"_id":"65205233fe5881ad35a318a4","name":"UMassAmherst","fullname":"University of Massachusetts Amherst","avatar":"https://www.gravatar.com/avatar/0126d0062c96fe5c76a9a41ebb11daff?d=retro&size=100"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"65205233fe5881ad35a318a4","name":"UMassAmherst","fullname":"University of Massachusetts Amherst","avatar":"https://www.gravatar.com/avatar/0126d0062c96fe5c76a9a41ebb11daff?d=retro&size=100"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.00590.md"}">

Papers

arxiv:2606.00590

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Published on May 30

· Submitted by

Md Zarif Ul Alam on Jun 8

University of Massachusetts Amherst

Upvote

Authors:

Md Zarif Ul Alam ,

Abstract

Critic-R framework enhances agentic search by closing the feedback loop between reasoning agents and retrieval models through critic evaluation and dual optimization mechanisms.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substantial progress, optimizing retrievers for agentic search remains challenging, often requiring heavy co-training or gold-standard annotations that limit real-world applicability. We propose Critic-R, a framework that explicitly closes the feedback loop between the reasoning agent and the retrieval model during both inference and training. Critic-R introduces a critic model that evaluates the agent's introspective reasoning trace after consuming retrieved evidence to determine whether the retrieved context sufficiently supports the next reasoning step. Critic-R has two complementary mechanisms: Critic-R-Zero, an inference-time query refinement loop that iteratively rewrites queries and retrieval instructions, and Critic-Embed, an optimization approach for retrieval models that leverages successful and failed refinement trajectories as automatic supervision without requiring manual relevance annotation. We evaluate Critic-R on HotpotQA, 2WikiMultihopQA, MuSiQue, and Bamboogle. Results show that Critic-R significantly improves both retrieval quality and downstream answer accuracy.

View arXiv page View PDF GitHub 1 Add to collection

Community

zarif98sjs

Paper author Paper submitter about 4 hours ago

•

edited about 4 hours ago

What's the real bottleneck in your search agent? Often it's the retriever, and you don't need to retrain your agent to fix it.

🗞️ Most existing agentic search approaches (like Search-R1) primarily optimize the reasoning agent while treating the retrieval model as a frozen black-box component. This design implicitly assumes that a sufficiently capable reasoning model can compensate for retrieval failures through improved query reformulation alone. We challenge this assumption by arguing that sub-optimal retrieval can be a bottleneck in agentic search performance. There has been some attempt (Agentic-R, CoSearch) to address this issue by jointly optimizing retrievers and reasoning agents. In practice, however, these methods are difficult to apply in settings where the reasoning model cannot be further trained, the retriever is externally provided, or gold-passage supervision is unavailable.

♦️ To address this, we propose Critic-R, a framework that closes the feedback loop between the reasoning agent and the retriever, at both inference and training time. Instead of blindly accepting whatever the retriever returns, Critic-R uses a separate critic model that reads the agent's introspective reasoning trace after it consumes the retrieved documents, and decides whether that evidence is actually sufficient to support the next reasoning step.

This verification signal powers two complementary mechanisms:
🔹 Critic-R-Zero (inference-time): when the critic finds the evidence insufficient, it rewrites the retrieval query and instruction based on reasoning agent's own introspective feedback and tries again, until the agent is satisfied or a refinement budget runs out. No gradient updates anywhere, the agent is untouched, and it works on top of any retriever, including those from Agentic-R or CoSearch.
🔹 Critic-Embed (training-time): to amortize the cost of refinement, we turn Critic-R-Zero's own trajectories into supervision. Documents that satisfy the agent become positives; documents rejected during failed refinement become hard intra-trajectory negatives. The retriever is fine-tuned with this signal, with no gold-passage annotations required.

Across HotpotQA, 2Wiki, MuSiQue, and Bamboogle:
✅ Critic-R-Zero has +12.4% relative improvement at inference time alone
✅ Critic-Embed gives +7.5% improvement when only the retriever is replaced, beating both off-the-shelf and co-trained retrievers

One interesting finding is that removing the agent's introspective feedback when collecting training data makes the retriever consistently worse. The agent's own sense of what's missing isn't a minor input to the critic, it's the primary supervisory signal Critic-Embed inherits.

Check out the paper for more details

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.00590

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.00590 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.00590 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback

Abstract

Community

Models citing this paper 1

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers