Hugging Face Daily Papers · May 20, 2026 · 7 min read

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

#multimodal #agents #reasoning #developer-tool

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Patient evidence, found — not given. Real clinical workflows require an agent to actively seek evidence across raw EHRs, medical imaging, and external knowledge — not just reason over a pre-selected context. \nWe introduce ClinSeekAgent, an automated agentic framework for dynamic multimodal evidence seeking in clinical decision support, and ship it with three concrete artifacts:\n<ol>\n<li>ClinSeekAgent (the pipeline). 20 tools across raw ehr.* tables, browser.* search, and image.* CXR analysis. The agent decides which to invoke and when to stop.\n</li>\n<li>ClinSeek-Bench. Each example is paired into Curated Input (the source benchmark's pre-selected evidence) and Automated Evidence-Seeking (only the patient ID + cutoff + tools). Same task, same label, only the access pattern changes.\n</li>\n<li>ClinSeek-35B-A3B. SFT of Qwen3.5-35B-A3B on Claude Opus 4.6 trajectories collected from ClinSeekAgent. Open-source state-of-the-art on AgentEHR-Bench, reaching 94.4% of the teacher.\n</li>\n</ol>\nCheckout more details at: <a href=\"https://ucsc-vlaa.github.io/ClinSeekAgent/\" rel=\"nofollow\">https://ucsc-vlaa.github.io/ClinSeekAgent/</a>\n","updatedAt":"2026-05-20T20:59:00.575Z","author":{"_id":"660026b7573abbcdb975a34f","avatarUrl":"/avatars/93defd0e6274cfe8f124220c59ec2bed.svg","fullname":"Juncheng Wu","name":"Chtholly17","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8085976243019104},"editors":["Chtholly17"],"editorAvatarUrls":["/avatars/93defd0e6274cfe8f124220c59ec2bed.svg"],"reactions":[],"isReport":false}},{"id":"6a0fe2645ac222b0ac1d81e5","author":{"_id":"660026b7573abbcdb975a34f","avatarUrl":"/avatars/93defd0e6274cfe8f124220c59ec2bed.svg","fullname":"Juncheng Wu","name":"Chtholly17","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false},"createdAt":"2026-05-22T04:58:12.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Clinical AI shouldn't just consume evidence handed to it — it should actively seek evidence, e.g., linking multimodal data, analyzing patient context, and retrieving external knowledge to support clinical reasoning 🔎\n\nIntroducing ClinSeekAgent — our automated agentic framework for active multimodal evidence seeking in clinical reasoning. \n\nClinSeekAgent exposes a unified space of 20 tools across 3 sources:\n• 11 raw-EHR retrieval tools\n• 3 web-search tools\n• 6 chest X-ray imaging tools\n\nNo fixed retrieval order — the agent plans, acts, and re-plans as new evidence emerges.\n\nWe built ClinSeek-Bench to test it: each example is paired — same task, same label — under two settings:\n🔒 Curated Input (evidence pre-selected)\n🔎 Automated Evidence-Seeking (raw data + tools only)\n\nResults:\n• Opus 4.6: 60.0→63.2 (text), 47.5→62.6 (multimodal)\n• MiniMax M2.5: 43.1→47.3\n• Phenotype reasoning alone: +34.0\n\nStronger agents → larger gains.","html":"Clinical AI shouldn't just consume evidence handed to it — it should actively seek evidence, e.g., linking multimodal data, analyzing patient context, and retrieving external knowledge to support clinical reasoning 🔎\nIntroducing ClinSeekAgent — our automated agentic framework for active multimodal evidence seeking in clinical reasoning. \nClinSeekAgent exposes a unified space of 20 tools across 3 sources: • 11 raw-EHR retrieval tools • 3 web-search tools • 6 chest X-ray imaging tools\nNo fixed retrieval order — the agent plans, acts, and re-plans as new evidence emerges.\nWe built ClinSeek-Bench to test it: each example is paired — same task, same label — under two settings: 🔒 Curated Input (evidence pre-selected) 🔎 Automated Evidence-Seeking (raw data + tools only)\nResults: • Opus 4.6: 60.0→63.2 (text), 47.5→62.6 (multimodal) • MiniMax M2.5: 43.1→47.3 • Phenotype reasoning alone: +34.0\nStronger agents → larger gains.\n","updatedAt":"2026-05-22T04:58:12.848Z","author":{"_id":"660026b7573abbcdb975a34f","avatarUrl":"/avatars/93defd0e6274cfe8f124220c59ec2bed.svg","fullname":"Juncheng Wu","name":"Chtholly17","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.790408730506897},"editors":["Chtholly17"],"editorAvatarUrls":["/avatars/93defd0e6274cfe8f124220c59ec2bed.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.20176","authors":[{"_id":"6a0e201e164dbbc68a26c386","user":{"_id":"660026b7573abbcdb975a34f","avatarUrl":"/avatars/93defd0e6274cfe8f124220c59ec2bed.svg","isPro":false,"fullname":"Juncheng Wu","user":"Chtholly17","type":"user","name":"Chtholly17"},"name":"Juncheng Wu","status":"claimed_verified","statusLastChangedAt":"2026-05-21T19:23:10.460Z","hidden":false},{"_id":"6a0e201e164dbbc68a26c387","name":"Letian Zhang","hidden":false},{"_id":"6a0e201e164dbbc68a26c388","name":"Yuhan Wang","hidden":false},{"_id":"6a0e201e164dbbc68a26c389","name":"Haoqin Tu","hidden":false},{"_id":"6a0e201e164dbbc68a26c38a","name":"Hardy Chen","hidden":false},{"_id":"6a0e201e164dbbc68a26c38b","name":"Zijun Wang","hidden":false},{"_id":"6a0e201e164dbbc68a26c38c","name":"Cihang Xie","hidden":false},{"_id":"6a0e201e164dbbc68a26c38d","name":"Yuyin Zhou","hidden":false}],"publishedAt":"2026-05-19T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning","submittedOnDailyBy":{"_id":"660026b7573abbcdb975a34f","avatarUrl":"/avatars/93defd0e6274cfe8f124220c59ec2bed.svg","isPro":false,"fullname":"Juncheng Wu","user":"Chtholly17","type":"user","name":"Chtholly17"},"summary":"Large language models (LLMs) and agentic systems have shown promise for clinical decision support, but existing works largely assume that evidence has already been curated and handed to the model. Real-world clinical workflows instead require agents to actively seek, iteratively plan, and synthesize multimodal evidence from heterogeneous sources. In this paper, we introduce ClinSeekAgent, an automated agentic framework for dynamic multimodal evidence seeking that shifts the paradigm from passive evidence consumption to active evidence acquisition. Given only a clinical query and access to raw data sources, ClinSeekAgent gathers evidence by querying medical knowledge bases, navigating raw EHRs, and invoking medical imaging tools; refines its hypotheses as new information emerges; and integrates the collected evidence into grounded clinical decisions. ClinSeekAgent serves both as an inference-time agent for frontier LLMs and as a training-time pipeline for distilling high-quality agent trajectories into compact open-source models. To validate its inference-time effectiveness, we construct ClinSeek-Bench, which pairs Curated Input reasoning from fixed pre-selected evidence with Automated Evidence-Seeking over raw clinical data. On text-only EHR tasks, ClinSeekAgent improves Claude Opus 4.6 from 60.0 to 63.2 overall F1 and MiniMax M2.5 from 43.1 to 47.3, with positive risk-prediction gains in 7 out of 9 evaluated host models. On multimodal tasks, ClinSeekAgent improves Claude Opus 4.6 from 47.5 to 62.6 (+15.1); all evaluated models improve across the three CXR-related task groups. We further validate ClinSeekAgent as a training pipeline by distilling agentic evidence-seeking trajectories into ClinSeek-35B-A3B, which achieves 34.0 average F1 on existing AgentEHR-Bench, improving over its Qwen3.5-35B-A3B baseline by +11.9 points and approaching Claude Opus 4.6.","upvotes":4,"discussionId":"6a0e201f164dbbc68a26c38e","projectPage":"https://ucsc-vlaa.github.io/ClinSeekAgent/","githubRepo":"https://github.com/UCSC-VLAA/ClinSeekAgent","githubRepoAddedBy":"user","ai_summary":"ClinSeekAgent is an automated agentic framework that enables large language models to actively acquire and synthesize multimodal clinical evidence from raw data sources, improving decision-making accuracy in both text-only and multimodal tasks.","ai_keywords":["large language models","agentic systems","clinical decision support","multimodal evidence seeking","automated agentic framework","medical knowledge bases","electronic health records","medical imaging tools","hypothesis refinement","grounded clinical decisions","inference-time agent","training-time pipeline","agent trajectories","compact open-source models","ClinSeek-Bench","AgentEHR-Bench"],"githubStars":3,"organization":{"_id":"65346047b3852ed1cec0c2f4","name":"UCSC-VLAA","fullname":"UCSC-VLAA","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/645eb61da3c5cd8a16efffff/E7m3g_fFhz32pGsnK0eqX.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"660026b7573abbcdb975a34f","avatarUrl":"/avatars/93defd0e6274cfe8f124220c59ec2bed.svg","isPro":false,"fullname":"Juncheng Wu","user":"Chtholly17","type":"user"},{"_id":"66f23f77353887e95e958ed2","avatarUrl":"/avatars/a394600629867add16783e15da60b0bb.svg","isPro":false,"fullname":"yzhou284","user":"yzhou284","type":"user"},{"_id":"604ae011caabafacfa48e3de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1615519738679-noauth.jpeg","isPro":false,"fullname":"Haoqin Tu","user":"PahaII","type":"user"},{"_id":"64efe4cf82c6eea604baffb2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64efe4cf82c6eea604baffb2/SLmCIJqLlEdOPNvTCgFR4.jpeg","isPro":false,"fullname":"Zijun Wang","user":"Olivia714","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"65346047b3852ed1cec0c2f4","name":"UCSC-VLAA","fullname":"UCSC-VLAA","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/645eb61da3c5cd8a16efffff/E7m3g_fFhz32pGsnK0eqX.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.20176.md"}">

Papers

arxiv:2605.20176

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

Published on May 19

· Submitted by

Juncheng Wu on May 22

UCSC-VLAA

Upvote

Authors:

Juncheng Wu ,

Abstract

ClinSeekAgent is an automated agentic framework that enables large language models to actively acquire and synthesize multimodal clinical evidence from raw data sources, improving decision-making accuracy in both text-only and multimodal tasks.

AI-generated summary

Large language models (LLMs) and agentic systems have shown promise for clinical decision support, but existing works largely assume that evidence has already been curated and handed to the model. Real-world clinical workflows instead require agents to actively seek, iteratively plan, and synthesize multimodal evidence from heterogeneous sources. In this paper, we introduce ClinSeekAgent, an automated agentic framework for dynamic multimodal evidence seeking that shifts the paradigm from passive evidence consumption to active evidence acquisition. Given only a clinical query and access to raw data sources, ClinSeekAgent gathers evidence by querying medical knowledge bases, navigating raw EHRs, and invoking medical imaging tools; refines its hypotheses as new information emerges; and integrates the collected evidence into grounded clinical decisions. ClinSeekAgent serves both as an inference-time agent for frontier LLMs and as a training-time pipeline for distilling high-quality agent trajectories into compact open-source models. To validate its inference-time effectiveness, we construct ClinSeek-Bench, which pairs Curated Input reasoning from fixed pre-selected evidence with Automated Evidence-Seeking over raw clinical data. On text-only EHR tasks, ClinSeekAgent improves Claude Opus 4.6 from 60.0 to 63.2 overall F1 and MiniMax M2.5 from 43.1 to 47.3, with positive risk-prediction gains in 7 out of 9 evaluated host models. On multimodal tasks, ClinSeekAgent improves Claude Opus 4.6 from 47.5 to 62.6 (+15.1); all evaluated models improve across the three CXR-related task groups. We further validate ClinSeekAgent as a training pipeline by distilling agentic evidence-seeking trajectories into ClinSeek-35B-A3B, which achieves 34.0 average F1 on existing AgentEHR-Bench, improving over its Qwen3.5-35B-A3B baseline by +11.9 points and approaching Claude Opus 4.6.

View arXiv page View PDF Project page GitHub 3 Add to collection

Community

Chtholly17

Paper author Paper submitter 1 day ago

We introduce ClinSeekAgent, an automated agentic framework for dynamic multimodal evidence seeking in clinical decision support, and ship it with three concrete artifacts:

ClinSeekAgent (the pipeline). 20 tools across raw ehr.* tables, browser.* search, and image.* CXR analysis. The agent decides which to invoke and when to stop.
ClinSeek-Bench. Each example is paired into Curated Input (the source benchmark's pre-selected evidence) and Automated Evidence-Seeking (only the patient ID + cutoff + tools). Same task, same label, only the access pattern changes.
ClinSeek-35B-A3B. SFT of Qwen3.5-35B-A3B on Claude Opus 4.6 trajectories collected from ClinSeekAgent. Open-source state-of-the-art on AgentEHR-Bench, reaching 94.4% of the teacher.

Checkout more details at: https://ucsc-vlaa.github.io/ClinSeekAgent/

Chtholly17

Paper author Paper submitter about 7 hours ago

Clinical AI shouldn't just consume evidence handed to it — it should actively seek evidence, e.g., linking multimodal data, analyzing patient context, and retrieving external knowledge to support clinical reasoning 🔎

Introducing ClinSeekAgent — our automated agentic framework for active multimodal evidence seeking in clinical reasoning.

ClinSeekAgent exposes a unified space of 20 tools across 3 sources:
• 11 raw-EHR retrieval tools
• 3 web-search tools
• 6 chest X-ray imaging tools

No fixed retrieval order — the agent plans, acts, and re-plans as new evidence emerges.

We built ClinSeek-Bench to test it: each example is paired — same task, same label — under two settings:
🔒 Curated Input (evidence pre-selected)
🔎 Automated Evidence-Seeking (raw data + tools only)

Results:
• Opus 4.6: 60.0→63.2 (text), 47.5→62.6 (multimodal)
• MiniMax M2.5: 43.1→47.3
• Phenotype reasoning alone: +34.0

Stronger agents → larger gains.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.20176

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.20176 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

Abstract

Community

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers