Dujuan Wang , Tao Cheng , Sutong Wang , Youhua (Frank) Chen , Yunqiang Yin
{"title":"SMR-agents:协同医学推理代理,用于零射击医学视觉问题回答与mlm","authors":"Dujuan Wang , Tao Cheng , Sutong Wang , Youhua (Frank) Chen , Yunqiang Yin","doi":"10.1016/j.ipm.2025.104297","DOIUrl":null,"url":null,"abstract":"<div><div>Existing medical visual question answering (Med-VQA) systems often lack transparent reasoning and robustness, limiting their clinical reliability. This study proposes the Synergistic Medical Reasoning Agents (SMR-Agents) framework to address these limitations by simulating collaborative consultation among multidisciplinary medical expert agents, thereby enhancing interpretability and diagnostic reliability. SMR-Agents first constructs a structured medical scene graph from the input image and question to identify and highlight relevant visual features. A pre-trained large language model then acts as a general practitioner, automatically selecting and coordinating a team of specialized medical expert agents based on this scene graph. The recruited experts engage in iterative reasoning: domain-specific diagnostic agents generate initial answer hypotheses, and a group of consulting experts conducts a peer-review discussion to refine the answer and formulate its explanatory rationale. The entire process operates in a zero-shot manner without task-specific training of the models. Evaluation on three public Med-VQA datasets and a private colorectal image dataset demonstrates that SMR-Agents achieves state-of-the-art performance across all benchmarks. Notably, it yields significant improvements in accuracy for open-ended questions and produces more interpretable reasoning compared to existing methods. These results demonstrate that combining structured scene understanding with iterative multi-expert collaboration substantially enhances both the accuracy and transparency of Med-VQA systems. The SMR-Agents framework thus provides a robust, interpretable approach to AI-assisted medical diagnosis, aligning machine reasoning with the expert-driven consultation processes used in clinical practice.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104297"},"PeriodicalIF":7.4000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SMR-agents: Synergistic medical reasoning agents for zero-shot medical visual question answering with MLLMs\",\"authors\":\"Dujuan Wang , Tao Cheng , Sutong Wang , Youhua (Frank) Chen , Yunqiang Yin\",\"doi\":\"10.1016/j.ipm.2025.104297\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Existing medical visual question answering (Med-VQA) systems often lack transparent reasoning and robustness, limiting their clinical reliability. This study proposes the Synergistic Medical Reasoning Agents (SMR-Agents) framework to address these limitations by simulating collaborative consultation among multidisciplinary medical expert agents, thereby enhancing interpretability and diagnostic reliability. SMR-Agents first constructs a structured medical scene graph from the input image and question to identify and highlight relevant visual features. A pre-trained large language model then acts as a general practitioner, automatically selecting and coordinating a team of specialized medical expert agents based on this scene graph. The recruited experts engage in iterative reasoning: domain-specific diagnostic agents generate initial answer hypotheses, and a group of consulting experts conducts a peer-review discussion to refine the answer and formulate its explanatory rationale. The entire process operates in a zero-shot manner without task-specific training of the models. Evaluation on three public Med-VQA datasets and a private colorectal image dataset demonstrates that SMR-Agents achieves state-of-the-art performance across all benchmarks. Notably, it yields significant improvements in accuracy for open-ended questions and produces more interpretable reasoning compared to existing methods. These results demonstrate that combining structured scene understanding with iterative multi-expert collaboration substantially enhances both the accuracy and transparency of Med-VQA systems. The SMR-Agents framework thus provides a robust, interpretable approach to AI-assisted medical diagnosis, aligning machine reasoning with the expert-driven consultation processes used in clinical practice.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 1\",\"pages\":\"Article 104297\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2025-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325002389\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002389","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
SMR-agents: Synergistic medical reasoning agents for zero-shot medical visual question answering with MLLMs
Existing medical visual question answering (Med-VQA) systems often lack transparent reasoning and robustness, limiting their clinical reliability. This study proposes the Synergistic Medical Reasoning Agents (SMR-Agents) framework to address these limitations by simulating collaborative consultation among multidisciplinary medical expert agents, thereby enhancing interpretability and diagnostic reliability. SMR-Agents first constructs a structured medical scene graph from the input image and question to identify and highlight relevant visual features. A pre-trained large language model then acts as a general practitioner, automatically selecting and coordinating a team of specialized medical expert agents based on this scene graph. The recruited experts engage in iterative reasoning: domain-specific diagnostic agents generate initial answer hypotheses, and a group of consulting experts conducts a peer-review discussion to refine the answer and formulate its explanatory rationale. The entire process operates in a zero-shot manner without task-specific training of the models. Evaluation on three public Med-VQA datasets and a private colorectal image dataset demonstrates that SMR-Agents achieves state-of-the-art performance across all benchmarks. Notably, it yields significant improvements in accuracy for open-ended questions and produces more interpretable reasoning compared to existing methods. These results demonstrate that combining structured scene understanding with iterative multi-expert collaboration substantially enhances both the accuracy and transparency of Med-VQA systems. The SMR-Agents framework thus provides a robust, interpretable approach to AI-assisted medical diagnosis, aligning machine reasoning with the expert-driven consultation processes used in clinical practice.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.