SMR-agents：协同医学推理代理，用于零射击医学视觉问题回答与mlm

IF 7.4 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-07-22 DOI:10.1016/j.ipm.2025.104297

Dujuan Wang , Tao Cheng , Sutong Wang , Youhua (Frank) Chen , Yunqiang Yin

{"title":"SMR-agents：协同医学推理代理，用于零射击医学视觉问题回答与mlm","authors":"Dujuan Wang , Tao Cheng , Sutong Wang , Youhua (Frank) Chen , Yunqiang Yin","doi":"10.1016/j.ipm.2025.104297","DOIUrl":null,"url":null,"abstract":"<div><div>Existing medical visual question answering (Med-VQA) systems often lack transparent reasoning and robustness, limiting their clinical reliability. This study proposes the Synergistic Medical Reasoning Agents (SMR-Agents) framework to address these limitations by simulating collaborative consultation among multidisciplinary medical expert agents, thereby enhancing interpretability and diagnostic reliability. SMR-Agents first constructs a structured medical scene graph from the input image and question to identify and highlight relevant visual features. A pre-trained large language model then acts as a general practitioner, automatically selecting and coordinating a team of specialized medical expert agents based on this scene graph. The recruited experts engage in iterative reasoning: domain-specific diagnostic agents generate initial answer hypotheses, and a group of consulting experts conducts a peer-review discussion to refine the answer and formulate its explanatory rationale. The entire process operates in a zero-shot manner without task-specific training of the models. Evaluation on three public Med-VQA datasets and a private colorectal image dataset demonstrates that SMR-Agents achieves state-of-the-art performance across all benchmarks. Notably, it yields significant improvements in accuracy for open-ended questions and produces more interpretable reasoning compared to existing methods. These results demonstrate that combining structured scene understanding with iterative multi-expert collaboration substantially enhances both the accuracy and transparency of Med-VQA systems. The SMR-Agents framework thus provides a robust, interpretable approach to AI-assisted medical diagnosis, aligning machine reasoning with the expert-driven consultation processes used in clinical practice.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104297"},"PeriodicalIF":7.4000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SMR-agents: Synergistic medical reasoning agents for zero-shot medical visual question answering with MLLMs\",\"authors\":\"Dujuan Wang , Tao Cheng , Sutong Wang , Youhua (Frank) Chen , Yunqiang Yin\",\"doi\":\"10.1016/j.ipm.2025.104297\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Existing medical visual question answering (Med-VQA) systems often lack transparent reasoning and robustness, limiting their clinical reliability. This study proposes the Synergistic Medical Reasoning Agents (SMR-Agents) framework to address these limitations by simulating collaborative consultation among multidisciplinary medical expert agents, thereby enhancing interpretability and diagnostic reliability. SMR-Agents first constructs a structured medical scene graph from the input image and question to identify and highlight relevant visual features. A pre-trained large language model then acts as a general practitioner, automatically selecting and coordinating a team of specialized medical expert agents based on this scene graph. The recruited experts engage in iterative reasoning: domain-specific diagnostic agents generate initial answer hypotheses, and a group of consulting experts conducts a peer-review discussion to refine the answer and formulate its explanatory rationale. The entire process operates in a zero-shot manner without task-specific training of the models. Evaluation on three public Med-VQA datasets and a private colorectal image dataset demonstrates that SMR-Agents achieves state-of-the-art performance across all benchmarks. Notably, it yields significant improvements in accuracy for open-ended questions and produces more interpretable reasoning compared to existing methods. These results demonstrate that combining structured scene understanding with iterative multi-expert collaboration substantially enhances both the accuracy and transparency of Med-VQA systems. The SMR-Agents framework thus provides a robust, interpretable approach to AI-assisted medical diagnosis, aligning machine reasoning with the expert-driven consultation processes used in clinical practice.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 1\",\"pages\":\"Article 104297\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2025-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325002389\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002389","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

现有的医学视觉问答（Med-VQA）系统往往缺乏透明的推理和鲁棒性，限制了其临床可靠性。本研究提出了协同医学推理代理（SMR-Agents）框架，通过模拟多学科医学专家代理之间的协作咨询来解决这些限制，从而提高可解释性和诊断可靠性。SMR-Agents首先从输入的图像和问题中构建结构化的医疗场景图，以识别和突出相关的视觉特征。然后，一个预先训练好的大型语言模型作为全科医生，根据这个场景图自动选择和协调一个专业的医疗专家代理团队。招募的专家进行迭代推理：特定领域的诊断代理生成初始答案假设，一组咨询专家进行同行评审讨论，以改进答案并制定其解释理由。整个过程以零射击的方式操作，没有对模型进行特定任务的训练。对三个公共Med-VQA数据集和一个私人结肠直肠图像数据集的评估表明，SMR-Agents在所有基准测试中都达到了最先进的性能。值得注意的是，与现有方法相比，它显著提高了开放性问题的准确性，并产生了更多可解释的推理。这些结果表明，将结构化场景理解与迭代多专家协作相结合，大大提高了Med-VQA系统的准确性和透明度。因此，SMR-Agents框架为人工智能辅助医疗诊断提供了一种强大的、可解释的方法，将机器推理与临床实践中使用的专家驱动的咨询过程结合起来。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SMR-agents: Synergistic medical reasoning agents for zero-shot medical visual question answering with MLLMs

Existing medical visual question answering (Med-VQA) systems often lack transparent reasoning and robustness, limiting their clinical reliability. This study proposes the Synergistic Medical Reasoning Agents (SMR-Agents) framework to address these limitations by simulating collaborative consultation among multidisciplinary medical expert agents, thereby enhancing interpretability and diagnostic reliability. SMR-Agents first constructs a structured medical scene graph from the input image and question to identify and highlight relevant visual features. A pre-trained large language model then acts as a general practitioner, automatically selecting and coordinating a team of specialized medical expert agents based on this scene graph. The recruited experts engage in iterative reasoning: domain-specific diagnostic agents generate initial answer hypotheses, and a group of consulting experts conducts a peer-review discussion to refine the answer and formulate its explanatory rationale. The entire process operates in a zero-shot manner without task-specific training of the models. Evaluation on three public Med-VQA datasets and a private colorectal image dataset demonstrates that SMR-Agents achieves state-of-the-art performance across all benchmarks. Notably, it yields significant improvements in accuracy for open-ended questions and produces more interpretable reasoning compared to existing methods. These results demonstrate that combining structured scene understanding with iterative multi-expert collaboration substantially enhances both the accuracy and transparency of Med-VQA systems. The SMR-Agents framework thus provides a robust, interpretable approach to AI-assisted medical diagnosis, aligning machine reasoning with the expert-driven consultation processes used in clinical practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.