Hugging Face Daily Papers · June 2, 2026 · 5 min read

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

v1\n","updatedAt":"2026-06-02T14:43:38.079Z","author":{"_id":"63578f828ed056fa1cccb7a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png","fullname":"yubol-bobo","name":"yubol","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"pt","probability":0.9997303485870361},"editors":["yubol"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png"],"reactions":[],"isReport":false}},{"id":"6a1f8a5fb4f9cc018c082110","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":360,"isUserFollowing":false},"createdAt":"2026-06-03T01:58:55.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora](https://huggingface.co/papers/2605.25382) (2026)\n* [Benchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Study](https://huggingface.co/papers/2605.02520) (2026)\n* [DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering](https://huggingface.co/papers/2606.01434) (2026)\n* [ASTRA-QA: A Benchmark for Abstract Question Answering over Documents](https://huggingface.co/papers/2605.10168) (2026)\n* [BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection](https://huggingface.co/papers/2604.10389) (2026)\n* [Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering](https://huggingface.co/papers/2604.27724) (2026)\n* [Same Ranking, Different Winner: How Scoring Targets Shape LLM Memory Benchmarks](https://huggingface.co/papers/2605.24060) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. \nThe following papers were recommended by the Semantic Scholar API \n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.25382\">AuthTrace: Diagnosing Evidence Construction in Thematically Dense Single-Author Corpora</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.02520\">Benchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Study</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2606.01434\">DrugClaw and DrugAudit: A Primary-Source-Grounded Agent and Authority-Aware Benchmark for Drug-Information Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.10168\">ASTRA-QA: A Benchmark for Abstract Question Answering over Documents</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10389\">BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.27724\">Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.24060\">Same Ranking, Different Winner: How Scoring Targets Shape LLM Memory Benchmarks</a> (2026)</li>\n</ul>\n Please give a thumbs up to this comment if you found it helpful!\n If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><a href=\"/librarian-bot\">@librarian-bot</a> recommend</code>\n","updatedAt":"2026-06-03T01:58:55.904Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":360,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7099631428718567},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.29084","authors":[{"_id":"6a1eec0fe292c1c78ecb10fb","name":"Yubo Li","hidden":false},{"_id":"6a1eec0fe292c1c78ecb10fc","name":"Rema Padman","hidden":false},{"_id":"6a1eec0fe292c1c78ecb10fd","name":"Ramayya Krishnan","hidden":false}],"publishedAt":"2026-05-27T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG","submittedOnDailyBy":{"_id":"63578f828ed056fa1cccb7a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png","isPro":false,"fullname":"yubol-bobo","user":"yubol","type":"user","name":"yubol"},"summary":"A retrieval-augmented generation (RAG) system deployed over a multi-author institutional corpus can give a different answer to the same question depending on which source it retrieves -- a failure mode the dominant single-gold-answer paradigm cannot diagnose. We argue that source-dependence is a missing axis of NLP evaluation, and that auditing it means shifting the unit of evaluation from answer correctness to the inter-source relationship. We make this concrete in transplant patient education, where institutional sources demonstrably disagree, releasing three artefacts: TransplantQA, a benchmark of real patient questions, each answered by grounding generation in multiple institutional handbooks as candidate sources; HERO-QA, a hierarchical retrieval strategy that grounds and audits each answer; and a structured-output judge that scores inter-source relationships on a validated 5-label taxonomy. At scale, better retrieval reveals far more disagreement than prior estimates suggested -- understating its prevalence, not its intensity. The framework is domain-agnostic and transfers to legal and educational RAG: measuring source-dependence is a responsibility for deployed multi-source NLP generally.","upvotes":1,"discussionId":"6a1eec0fe292c1c78ecb10fe","ai_summary":"Retrieval-augmented generation systems exhibit source-dependent responses to identical queries, necessitating a shift from traditional correctness evaluation to analyzing inter-source relationships for multi-source NLP systems.","ai_keywords":["retrieval-augmented generation","source-dependence","NLP evaluation","grounding generation","hierarchical retrieval","structured-output judge","multi-source NLP"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63578f828ed056fa1cccb7a4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63578f828ed056fa1cccb7a4/WH1M3yyAwl9AcZdnRZqyj.png","isPro":false,"fullname":"yubol-bobo","user":"yubol","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"691d9a1012cc4d473e1c862f","name":"CarnegieMellonU","fullname":"Carnegie Mellon University","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/68e396f2b5bb631e9b2fac9a/6I146aJvxxlRCEbYFFAeQ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.29084.md"}">

Papers

arxiv:2605.29084

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Published on May 27

· Submitted by

yubol-bobo on Jun 2

Carnegie Mellon University

Upvote

Authors:

Abstract

Retrieval-augmented generation systems exhibit source-dependent responses to identical queries, necessitating a shift from traditional correctness evaluation to analyzing inter-source relationships for multi-source NLP systems.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

A retrieval-augmented generation (RAG) system deployed over a multi-author institutional corpus can give a different answer to the same question depending on which source it retrieves -- a failure mode the dominant single-gold-answer paradigm cannot diagnose. We argue that source-dependence is a missing axis of NLP evaluation, and that auditing it means shifting the unit of evaluation from answer correctness to the inter-source relationship. We make this concrete in transplant patient education, where institutional sources demonstrably disagree, releasing three artefacts: TransplantQA, a benchmark of real patient questions, each answered by grounding generation in multiple institutional handbooks as candidate sources; HERO-QA, a hierarchical retrieval strategy that grounds and audits each answer; and a structured-output judge that scores inter-source relationships on a validated 5-label taxonomy. At scale, better retrieval reveals far more disagreement than prior estimates suggested -- understating its prevalence, not its intensity. The framework is domain-agnostic and transfers to legal and educational RAG: measuring source-dependence is a responsibility for deployed multi-source NLP generally.

View arXiv page View PDF Add to collection

Community

yubol

Paper submitter about 11 hours ago

librarian-bot

2 minutes ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.29084

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.29084 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.29084 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.29084 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers