Hugging Face Daily Papers · · 4 min read

Towards Self-Evolving Agentic Literature Retrieval

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

<a href=\"https://cdn-uploads.huggingface.co/production/uploads/67934b85c67af4a116b5594b/zo3KBYGVVhmRwCtmvKl7b.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/67934b85c67af4a116b5594b/zo3KBYGVVhmRwCtmvKl7b.png\" alt=\"image\"></a></p>\n","updatedAt":"2026-05-15T00:43:20.439Z","author":{"_id":"67934b85c67af4a116b5594b","avatarUrl":"/avatars/6a5a75cdbb8ddcdff16e3a8a1987d214.svg","fullname":"yuwendu","name":"yuwendu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.49082690477371216},"editors":["yuwendu"],"editorAvatarUrls":["/avatars/6a5a75cdbb8ddcdff16e3a8a1987d214.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14306","authors":[{"_id":"6a066bb5b1a8cbabc9f09792","name":"Yuwen Du","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09793","name":"Tian Jin","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09794","name":"Jing Kang","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09795","name":"Xianghe Pang","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09796","name":"Jingyi Chai","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09797","name":"Tingjia Miao","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09798","name":"Fenyi Liu","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09799","name":"WenHao Wang","hidden":false},{"_id":"6a066bb5b1a8cbabc9f0979a","name":"Sikai Yao","hidden":false},{"_id":"6a066bb5b1a8cbabc9f0979b","name":"Yuzhi Zhang","hidden":false},{"_id":"6a066bb5b1a8cbabc9f0979c","name":"Siheng Chen","hidden":false}],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"Towards Self-Evolving Agentic Literature Retrieval","submittedOnDailyBy":{"_id":"67934b85c67af4a116b5594b","avatarUrl":"/avatars/6a5a75cdbb8ddcdff16e3a8a1987d214.svg","isPro":false,"fullname":"yuwendu","user":"yuwendu","type":"user","name":"yuwendu"},"summary":"As large language models reshape scientific research, literature retrieval faces a twofold challenge: ensuring source authenticity while maintaining a deep comprehension of academic search intents. While reliable, traditional keyword-centric search fails to capture complex research intents. Frontier LLMs can handle complex research intents, but their high cost and tendency to hallucinate remain key limitations. Here we introduce PaSaMaster, a self-evolving agentic literature retrieval system that produces relevance-scored paper rankings with evidence-grounded recommendations through iterative intent analysis, retrieval, and ranking. It is built on three key designs. First, it transforms literature retrieval from a one shot query--document matching problem into a search process that evolves over time, using ranked evidence to reveal gaps, refine intents, and guide follow-up searches. Second, it prevents hallucinated sources by treating retrieval as intent--paper relevance ranking rather than generation. Finally, PaSaMaster improves cost efficiency by separating planning from retrieval: a frontier LLM is used only for intent understanding, while large scale retrieval and relevance scoring are delegated to customized corpora and lightweight models. Evaluated on the PaSaMaster Benchmark across 38 scientific disciplines, our system exposes the severe inaccuracy and incompleteness of traditional keyword retrieval (improving F1-score by 15.6X) and the unreliability of generative LLMs (which exhibit hallucination rates up to 37.79%). Remarkably, PaSaMaster outperforms GPT-5.2 by 30.0% at a mere 1% of the computational cost while ensuring zero source hallucination: https://github.com/sjtu-sai-agents/PaSaMaster","upvotes":2,"discussionId":"6a066bb6b1a8cbabc9f0979d","githubRepo":"https://github.com/sjtu-sai-agents/PaSaMaster","githubRepoAddedBy":"user","ai_summary":"PaSaMaster is a self-evolving agentic literature retrieval system that improves academic search accuracy and cost efficiency through iterative intent analysis and evidence-based ranking.","ai_keywords":["large language models","literature retrieval","search process","intent analysis","relevance ranking","hallucination","computational cost","PaSaMaster Benchmark","evidence-grounded recommendations","self-evolving agentic system"],"githubStars":1,"organization":{"_id":"63e5ef7bf2e9a8f22c515654","name":"SJTU","fullname":"Shanghai Jiao Tong University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676013394657-63e5ee22b6a40bf941da0928.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67934b85c67af4a116b5594b","avatarUrl":"/avatars/6a5a75cdbb8ddcdff16e3a8a1987d214.svg","isPro":false,"fullname":"yuwendu","user":"yuwendu","type":"user"},{"_id":"65257545b017be1fc1915364","avatarUrl":"/avatars/9bffd3fb567d2fa1e5c3546d77560b43.svg","isPro":false,"fullname":"Siheng Chen","user":"sihengchen","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63e5ef7bf2e9a8f22c515654","name":"SJTU","fullname":"Shanghai Jiao Tong University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676013394657-63e5ee22b6a40bf941da0928.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.14306.md"}">
Papers
arxiv:2605.14306

Towards Self-Evolving Agentic Literature Retrieval

Published on May 14
· Submitted by
yuwendu
on May 14
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

PaSaMaster is a self-evolving agentic literature retrieval system that improves academic search accuracy and cost efficiency through iterative intent analysis and evidence-based ranking.

AI-generated summary

As large language models reshape scientific research, literature retrieval faces a twofold challenge: ensuring source authenticity while maintaining a deep comprehension of academic search intents. While reliable, traditional keyword-centric search fails to capture complex research intents. Frontier LLMs can handle complex research intents, but their high cost and tendency to hallucinate remain key limitations. Here we introduce PaSaMaster, a self-evolving agentic literature retrieval system that produces relevance-scored paper rankings with evidence-grounded recommendations through iterative intent analysis, retrieval, and ranking. It is built on three key designs. First, it transforms literature retrieval from a one shot query--document matching problem into a search process that evolves over time, using ranked evidence to reveal gaps, refine intents, and guide follow-up searches. Second, it prevents hallucinated sources by treating retrieval as intent--paper relevance ranking rather than generation. Finally, PaSaMaster improves cost efficiency by separating planning from retrieval: a frontier LLM is used only for intent understanding, while large scale retrieval and relevance scoring are delegated to customized corpora and lightweight models. Evaluated on the PaSaMaster Benchmark across 38 scientific disciplines, our system exposes the severe inaccuracy and incompleteness of traditional keyword retrieval (improving F1-score by 15.6X) and the unreliability of generative LLMs (which exhibit hallucination rates up to 37.79%). Remarkably, PaSaMaster outperforms GPT-5.2 by 30.0% at a mere 1% of the computational cost while ensuring zero source hallucination: https://github.com/sjtu-sai-agents/PaSaMaster

Community

Paper submitter about 1 hour ago

image

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.14306
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.14306 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.14306 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.14306 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers