Hugging Face Daily Papers · May 15, 2026 · 4 min read

Towards Self-Evolving Agentic Literature Retrieval

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

<a href=\"https://cdn-uploads.huggingface.co/production/uploads/67934b85c67af4a116b5594b/zo3KBYGVVhmRwCtmvKl7b.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/67934b85c67af4a116b5594b/zo3KBYGVVhmRwCtmvKl7b.png\" alt=\"image\"></a></p>\n","updatedAt":"2026-05-15T00:43:20.439Z","author":{"_id":"67934b85c67af4a116b5594b","avatarUrl":"/avatars/6a5a75cdbb8ddcdff16e3a8a1987d214.svg","fullname":"yuwendu","name":"yuwendu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.49082690477371216},"editors":["yuwendu"],"editorAvatarUrls":["/avatars/6a5a75cdbb8ddcdff16e3a8a1987d214.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.14306","authors":[{"_id":"6a066bb5b1a8cbabc9f09792","name":"Yuwen Du","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09793","name":"Tian Jin","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09794","name":"Jing Kang","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09795","name":"Xianghe Pang","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09796","name":"Jingyi Chai","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09797","name":"Tingjia Miao","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09798","name":"Fenyi Liu","hidden":false},{"_id":"6a066bb5b1a8cbabc9f09799","name":"WenHao Wang","hidden":false},{"_id":"6a066bb5b1a8cbabc9f0979a","name":"Sikai Yao","hidden":false},{"_id":"6a066bb5b1a8cbabc9f0979b","name":"Yuzhi Zhang","hidden":false},{"_id":"6a066bb5b1a8cbabc9f0979c","name":"Siheng Chen","hidden":false}],"publishedAt":"2026-05-14T00:00:00.000Z","submittedOnDailyAt":"2026-05-14T00:00:00.000Z","title":"Towards Self-Evolving Agentic Literature Retrieval","submittedOnDailyBy":{"_id":"67934b85c67af4a116b5594b","avatarUrl":"/avatars/6a5a75cdbb8ddcdff16e3a8a1987d214.svg","isPro":false,"fullname":"yuwendu","user":"yuwendu","type":"user","name":"yuwendu"},"summary":"As large language models reshape scientific research, literature retrieval faces a twofold challenge: ensuring source authenticity while maintaining a deep comprehension of academic search intents. While reliable, traditional keyword-centric search fails to capture complex research intents. Frontier LLMs can handle complex research intents, but their high cost and tendency to hallucinate remain key limitations. Here we introduce PaSaMaster, a self-evolving agentic literature retrieval system that produces relevance-scored paper rankings with evidence-grounded recommendations through iterative intent analysis, retrieval, and ranking. It is built on three key designs. First, it transforms literature retrieval from a one shot query--document matching problem into a search process that evolves over time, using ranked evidence to reveal gaps, refine intents, and guide follow-up searches. Second, it prevents hallucinated sources by treating retrieval as intent--paper relevance ranking rather than generation. Finally, PaSaMaster improves cost efficiency by separating planning from retrieval: a frontier LLM is used only for intent understanding, while large scale retrieval and relevance scoring are delegated to customized corpora and lightweight models. Evaluated on the PaSaMaster Benchmark across 38 scientific disciplines, our system exposes the severe inaccuracy and incompleteness of traditional keyword retrieval (improving F1-score by 15.6X) and the unreliability of generative LLMs (which exhibit hallucination rates up to 37.79%). Remarkably, PaSaMaster outperforms GPT-5.2 by 30.0% at a mere 1% of the computational cost while ensuring zero source hallucination: https://github.com/sjtu-sai-agents/PaSaMaster","upvotes":2,"discussionId":"6a066bb6b1a8cbabc9f0979d","githubRepo":"https://github.com/sjtu-sai-agents/PaSaMaster","githubRepoAddedBy":"user","ai_summary":"PaSaMaster is a self-evolving agentic literature retrieval system that improves academic search accuracy and cost efficiency through iterative intent analysis and evidence-based ranking.","ai_keywords":["large language models","literature retrieval","search process","intent analysis","relevance ranking","hallucination","computational cost","PaSaMaster Benchmark","evidence-grounded recommendations","self-evolving agentic system"],"githubStars":1,"organization":{"_id":"63e5ef7bf2e9a8f22c515654","name":"SJTU","fullname":"Shanghai Jiao Tong University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676013394657-63e5ee22b6a40bf941da0928.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"67934b85c67af4a116b5594b","avatarUrl":"/avatars/6a5a75cdbb8ddcdff16e3a8a1987d214.svg","isPro":false,"fullname":"yuwendu","user":"yuwendu","type":"user"},{"_id":"65257545b017be1fc1915364","avatarUrl":"/avatars/9bffd3fb567d2fa1e5c3546d77560b43.svg","isPro":false,"fullname":"Siheng Chen","user":"sihengchen","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"63e5ef7bf2e9a8f22c515654","name":"SJTU","fullname":"Shanghai Jiao Tong University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1676013394657-63e5ee22b6a40bf941da0928.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.14306.md"}">

Papers

arxiv:2605.14306

Towards Self-Evolving Agentic Literature Retrieval

Published on May 14

· Submitted by

yuwendu on May 14

Shanghai Jiao Tong University

Upvote

Authors:

Abstract

PaSaMaster is a self-evolving agentic literature retrieval system that improves academic search accuracy and cost efficiency through iterative intent analysis and evidence-based ranking.

AI-generated summary

As large language models reshape scientific research, literature retrieval faces a twofold challenge: ensuring source authenticity while maintaining a deep comprehension of academic search intents. While reliable, traditional keyword-centric search fails to capture complex research intents. Frontier LLMs can handle complex research intents, but their high cost and tendency to hallucinate remain key limitations. Here we introduce PaSaMaster, a self-evolving agentic literature retrieval system that produces relevance-scored paper rankings with evidence-grounded recommendations through iterative intent analysis, retrieval, and ranking. It is built on three key designs. First, it transforms literature retrieval from a one shot query--document matching problem into a search process that evolves over time, using ranked evidence to reveal gaps, refine intents, and guide follow-up searches. Second, it prevents hallucinated sources by treating retrieval as intent--paper relevance ranking rather than generation. Finally, PaSaMaster improves cost efficiency by separating planning from retrieval: a frontier LLM is used only for intent understanding, while large scale retrieval and relevance scoring are delegated to customized corpora and lightweight models. Evaluated on the PaSaMaster Benchmark across 38 scientific disciplines, our system exposes the severe inaccuracy and incompleteness of traditional keyword retrieval (improving F1-score by 15.6X) and the unreliability of generative LLMs (which exhibit hallucination rates up to 37.79%). Remarkably, PaSaMaster outperforms GPT-5.2 by 30.0% at a mere 1% of the computational cost while ensuring zero source hallucination: https://github.com/sjtu-sai-agents/PaSaMaster

View arXiv page View PDF GitHub 1 Add to collection

Community

yuwendu

Paper submitter about 1 hour ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.14306

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.14306 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.14306 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.14306 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Towards Self-Evolving Agentic Literature Retrieval

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers