Hugging Face Daily Papers · · 5 min read

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

v1</p>\n","updatedAt":"2026-06-02T14:43:38.079Z","author":{"_id":"63578f828ed056fa1cccb7a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png","fullname":"yubol-bobo","name":"yubol","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"pt","probability":0.9997303485870361},"editors":["yubol"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png"],"reactions":[],"isReport":false}},{"id":"6a1f8a5fb4f9cc018c082110","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":360,"isUserFollowing":false},"createdAt":"2026-06-03T01:58:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora](https://huggingface.co/papers/2605.25382) (2026)\n* [Benchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Study](https://huggingface.co/papers/2605.02520) (2026)\n* [DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering](https://huggingface.co/papers/2606.01434) (2026)\n* [ASTRA-QA: A Benchmark for Abstract Question Answering over Documents](https://huggingface.co/papers/2605.10168) (2026)\n* [BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection](https://huggingface.co/papers/2604.10389) (2026)\n* [Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering](https://huggingface.co/papers/2604.27724) (2026)\n* [Same Ranking, Different Winner: How Scoring Targets Shape LLM Memory Benchmarks](https://huggingface.co/papers/2605.24060) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.25382\">AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.02520\">Benchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Study</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.01434\">DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.10168\">ASTRA-QA: A Benchmark for Abstract Question Answering over Documents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10389\">BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.27724\">Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.24060\">Same Ranking, Different Winner: How Scoring Targets Shape LLM Memory Benchmarks</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-06-03T01:58:55.904Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":360,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7099631428718567},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29084","authors":[{"_id":"6a1eec0fe292c1c78ecb10fb","name":"Yubo Li","hidden":false},{"_id":"6a1eec0fe292c1c78ecb10fc","name":"Rema Padman","hidden":false},{"_id":"6a1eec0fe292c1c78ecb10fd","name":"Ramayya Krishnan","hidden":false}],"publishedAt":"2026-05-27T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG","submittedOnDailyBy":{"_id":"63578f828ed056fa1cccb7a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png","isPro":false,"fullname":"yubol-bobo","user":"yubol","type":"user","name":"yubol"},"summary":"A retrieval-augmented generation (RAG) system deployed over a multi-author institutional corpus can give a different answer to the same question depending on which source it retrieves -- a failure mode the dominant single-gold-answer paradigm cannot diagnose. We argue that source-dependence is a missing axis of NLP evaluation, and that auditing it means shifting the unit of evaluation from answer correctness to the inter-source relationship. We make this concrete in transplant patient education, where institutional sources demonstrably disagree, releasing three artefacts: TransplantQA, a benchmark of real patient questions, each answered by grounding generation in multiple institutional handbooks as candidate sources; HERO-QA, a hierarchical retrieval strategy that grounds and audits each answer; and a structured-output judge that scores inter-source relationships on a validated 5-label taxonomy. At scale, better retrieval reveals far more disagreement than prior estimates suggested -- understating its prevalence, not its intensity. The framework is domain-agnostic and transfers to legal and educational RAG: measuring source-dependence is a responsibility for deployed multi-source NLP generally.","upvotes":1,"discussionId":"6a1eec0fe292c1c78ecb10fe","ai_summary":"Retrieval-augmented generation systems exhibit source-dependent responses to identical queries, necessitating a shift from traditional correctness evaluation to analyzing inter-source relationships for multi-source NLP systems.","ai_keywords":["retrieval-augmented generation","source-dependence","NLP evaluation","grounding generation","hierarchical retrieval","structured-output judge","multi-source NLP"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63578f828ed056fa1cccb7a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png","isPro":false,"fullname":"yubol-bobo","user":"yubol","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29084.md"}">
Papers
arxiv:2605.29084

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Published on May 27
· Submitted by
yubol-bobo
on Jun 2
Authors:
,
,

Abstract

Retrieval-augmented generation systems exhibit source-dependent responses to identical queries, necessitating a shift from traditional correctness evaluation to analyzing inter-source relationships for multi-source NLP systems.

A retrieval-augmented generation (RAG) system deployed over a multi-author institutional corpus can give a different answer to the same question depending on which source it retrieves -- a failure mode the dominant single-gold-answer paradigm cannot diagnose. We argue that source-dependence is a missing axis of NLP evaluation, and that auditing it means shifting the unit of evaluation from answer correctness to the inter-source relationship. We make this concrete in transplant patient education, where institutional sources demonstrably disagree, releasing three artefacts: TransplantQA, a benchmark of real patient questions, each answered by grounding generation in multiple institutional handbooks as candidate sources; HERO-QA, a hierarchical retrieval strategy that grounds and audits each answer; and a structured-output judge that scores inter-source relationships on a validated 5-label taxonomy. At scale, better retrieval reveals far more disagreement than prior estimates suggested -- understating its prevalence, not its intensity. The framework is domain-agnostic and transfers to legal and educational RAG: measuring source-dependence is a responsibility for deployed multi-source NLP generally.

Community

Paper submitter about 11 hours ago

v1

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.29084
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.29084 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.29084 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.29084 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers