Major agents robotics

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

Juncheng Wu, Letian Zhang, Yuhan Wang, Haoqin Tu, Hardy Chen, Zijun Wang

Published
May 19, 2026 — 17:58 UTC

Problem
This paper addresses the gap in clinical decision support systems where existing models primarily rely on pre-curated evidence, limiting their applicability in real-world scenarios. The authors propose ClinSeekAgent, an automated agentic framework designed to actively seek and synthesize multimodal evidence from heterogeneous sources, rather than passively consuming provided data. This work is presented as a preprint and has not yet undergone peer review.

Method
ClinSeekAgent operates as a dynamic evidence-seeking agent that utilizes a combination of querying medical knowledge bases, navigating electronic health records (EHRs), and invoking medical imaging tools. The architecture allows for iterative hypothesis refinement as new information is acquired. The framework serves dual purposes: it functions as an inference-time agent for large language models (LLMs) and as a training-time pipeline for distilling high-quality agent trajectories into compact models. The authors introduce ClinSeek-Bench, a benchmark that contrasts curated input reasoning with automated evidence-seeking capabilities. The training compute details are not disclosed, but the model is designed to distill agentic trajectories into a new model, ClinSeek-35B-A3B.

Results
ClinSeekAgent demonstrates significant improvements over baseline models on both text-only and multimodal tasks. For text-only EHR tasks, it enhances the overall F1 score of Claude Opus 4.6 from 60.0 to 63.2 and MiniMax M2.5 from 43.1 to 47.3, achieving positive risk-prediction gains in 7 out of 9 evaluated host models. In multimodal tasks, ClinSeekAgent boosts Claude Opus 4.6’s F1 score from 47.5 to 62.6, marking a substantial increase of +15.1. All evaluated models show improvements across three chest X-ray (CXR)-related task groups. The distilled model, ClinSeek-35B-A3B, achieves an average F1 score of 34.0 on the AgentEHR-Bench, surpassing its Qwen3.5-35B-A3B baseline by +11.9 points and nearing the performance of Claude Opus 4.6.

Limitations
The authors acknowledge that ClinSeekAgent’s performance is contingent on the quality and accessibility of the underlying data sources. They do not address potential biases in the data or the implications of model interpretability in clinical settings. Additionally, the reliance on multimodal data may introduce complexity in deployment, particularly in environments with limited access to certain modalities.

Why it matters
ClinSeekAgent represents a significant advancement in the field of clinical AI by enabling active evidence acquisition, which is crucial for real-time clinical decision-making. This framework not only enhances the capabilities of existing LLMs but also sets a precedent for future research in agentic systems that require dynamic interaction with diverse data sources. The implications extend to improving patient outcomes through more informed clinical reasoning and decision support, paving the way for more autonomous clinical agents in healthcare.

Authors: Juncheng Wu, Letian Zhang, Yuhan Wang, Haoqin Tu, Hardy Chen, Zijun Wang, Cihang Xie, Yuyin Zhou
Source: arXiv:2605.20176
URL: https://arxiv.org/abs/2605.20176v1

Turing Wire

By Turing Wire editorial staff · May 19, 2026 · Editorial standards →

Source: arXiv cs.CL