v1</p>\n","updatedAt":"2026-06-02T14:43:38.079Z","author":{"_id":"63578f828ed056fa1cccb7a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png","fullname":"yubol-bobo","name":"yubol","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"pt","probability":0.9997303485870361},"editors":["yubol"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png"],"reactions":[],"isReport":false}},{"id":"6a1f8a5fb4f9cc018c082110","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":360,"isUserFollowing":false},"createdAt":"2026-06-03T01:58:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora](https://huggingface.co/papers/2605.25382) (2026)\n* [Benchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Study](https://huggingface.co/papers/2605.02520) (2026)\n* [DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering](https://huggingface.co/papers/2606.01434) (2026)\n* [ASTRA-QA: A Benchmark for Abstract Question Answering over Documents](https://huggingface.co/papers/2605.10168) (2026)\n* [BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection](https://huggingface.co/papers/2604.10389) (2026)\n* [Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering](https://huggingface.co/papers/2604.27724) (2026)\n* [Same Ranking, Different Winner: How Scoring Targets Shape LLM Memory Benchmarks](https://huggingface.co/papers/2605.24060) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.25382\">AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.02520\">Benchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Study</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.01434\">DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.10168\">ASTRA-QA: A Benchmark for Abstract Question Answering over Documents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10389\">BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.27724\">Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.24060\">Same Ranking, Different Winner: How Scoring Targets Shape LLM Memory Benchmarks</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-06-03T01:58:55.904Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":360,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7099631428718567},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29084","authors":[{"_id":"6a1eec0fe292c1c78ecb10fb","name":"Yubo Li","hidden":false},{"_id":"6a1eec0fe292c1c78ecb10fc","name":"Rema Padman","hidden":false},{"_id":"6a1eec0fe292c1c78ecb10fd","name":"Ramayya Krishnan","hidden":false}],"publishedAt":"2026-05-27T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG","submittedOnDailyBy":{"_id":"63578f828ed056fa1cccb7a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png","isPro":false,"fullname":"yubol-bobo","user":"yubol","type":"user","name":"yubol"},"summary":"A retrieval-augmented generation (RAG) system deployed over a multi-author institutional corpus can give a different answer to the same question depending on which source it retrieves -- a failure mode the dominant single-gold-answer paradigm cannot diagnose. We argue that source-dependence is a missing axis of NLP evaluation, and that auditing it means shifting the unit of evaluation from answer correctness to the inter-source relationship. We make this concrete in transplant patient education, where institutional sources demonstrably disagree, releasing three artefacts: TransplantQA, a benchmark of real patient questions, each answered by grounding generation in multiple institutional handbooks as candidate sources; HERO-QA, a hierarchical retrieval strategy that grounds and audits each answer; and a structured-output judge that scores inter-source relationships on a validated 5-label taxonomy. At scale, better retrieval reveals far more disagreement than prior estimates suggested -- understating its prevalence, not its intensity. The framework is domain-agnostic and transfers to legal and educational RAG: measuring source-dependence is a responsibility for deployed multi-source NLP generally.","upvotes":1,"discussionId":"6a1eec0fe292c1c78ecb10fe","ai_summary":"Retrieval-augmented generation systems exhibit source-dependent responses to identical queries, necessitating a shift from traditional correctness evaluation to analyzing inter-source relationships for multi-source NLP systems.","ai_keywords":["retrieval-augmented generation","source-dependence","NLP evaluation","grounding generation","hierarchical retrieval","structured-output judge","multi-source NLP"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63578f828ed056fa1cccb7a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png","isPro":false,"fullname":"yubol-bobo","user":"yubol","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29084.md"}">
Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG
Abstract
Retrieval-augmented generation systems exhibit source-dependent responses to identical queries, necessitating a shift from traditional correctness evaluation to analyzing inter-source relationships for multi-source NLP systems.
A retrieval-augmented generation (RAG) system deployed over a multi-author institutional corpus can give a different answer to the same question depending on which source it retrieves -- a failure mode the dominant single-gold-answer paradigm cannot diagnose. We argue that source-dependence is a missing axis of NLP evaluation, and that auditing it means shifting the unit of evaluation from answer correctness to the inter-source relationship. We make this concrete in transplant patient education, where institutional sources demonstrably disagree, releasing three artefacts: TransplantQA, a benchmark of real patient questions, each answered by grounding generation in multiple institutional handbooks as candidate sources; HERO-QA, a hierarchical retrieval strategy that grounds and audits each answer; and a structured-output judge that scores inter-source relationships on a validated 5-label taxonomy. At scale, better retrieval reveals far more disagreement than prior estimates suggested -- understating its prevalence, not its intensity. The framework is domain-agnostic and transfers to legal and educational RAG: measuring source-dependence is a responsibility for deployed multi-source NLP generally.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.29084 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.29084 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.29084 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.