Xiao-Han Wan, Mei-Xia Liu, Yan Zhang, Guan-Jun Kou, Lei-Qi Xu, Han Liu, Xiao-Yun Yang, Xiu-Li Zuo, Yan-Qing Li
{"title":"DeepGut: A collaborative multimodal large language model framework for digestive disease assisted diagnosis and treatment.","authors":"Xiao-Han Wan, Mei-Xia Liu, Yan Zhang, Guan-Jun Kou, Lei-Qi Xu, Han Liu, Xiao-Yun Yang, Xiu-Li Zuo, Yan-Qing Li","doi":"10.3748/wjg.v31.i31.109948","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Gastrointestinal diseases have complex etiologies and clinical presentations. An accurate diagnosis requires physicians to integrate diverse information, including medical history, laboratory test results, and imaging findings. Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information, resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.</p><p><strong>Aim: </strong>To develop and evaluate a collaborative multimodal large language model (LLM) framework for clinical decision-making in digestive diseases.</p><p><strong>Methods: </strong>In this observational study, DeepGut, a multimodal LLM collaborative diagnostic framework, was developed to integrate four distinct large models into a four-tiered structure. The framework sequentially accomplishes multimodal information extraction, logical \"chain\" construction, diagnostic and treatment suggestion generation, and risk analysis. The model was evaluated using objective metrics, which assess the reliability and comprehensiveness of model-generated results, and subjective expert opinions, which examine the effectiveness of the framework in assisting physicians.</p><p><strong>Results: </strong>The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance, with a diagnostic accuracy of 97.8%, diagnostic completeness of 93.9%, treatment plan accuracy of 95.2%, and treatment plan completeness of 98.0%, significantly surpassing the capabilities of single-modal LLM-based diagnostic tools. Experts evaluating the framework commended the completeness, relevance, and logical coherence of its outputs. However, the collaborative multimodal LLM approach resulted in increased input and output token counts, leading to higher computational costs and extended diagnostic times.</p><p><strong>Conclusion: </strong>The framework achieves successful integration of multimodal diagnostic data, demonstrating enhanced performance enabled by multimodal LLM collaboration, which opens new horizons for the clinical application of artificial intelligence-assisted technology.</p>","PeriodicalId":23778,"journal":{"name":"World Journal of Gastroenterology","volume":"31 31","pages":"109948"},"PeriodicalIF":5.4000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400200/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Gastroenterology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3748/wjg.v31.i31.109948","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Gastrointestinal diseases have complex etiologies and clinical presentations. An accurate diagnosis requires physicians to integrate diverse information, including medical history, laboratory test results, and imaging findings. Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information, resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.
Aim: To develop and evaluate a collaborative multimodal large language model (LLM) framework for clinical decision-making in digestive diseases.
Methods: In this observational study, DeepGut, a multimodal LLM collaborative diagnostic framework, was developed to integrate four distinct large models into a four-tiered structure. The framework sequentially accomplishes multimodal information extraction, logical "chain" construction, diagnostic and treatment suggestion generation, and risk analysis. The model was evaluated using objective metrics, which assess the reliability and comprehensiveness of model-generated results, and subjective expert opinions, which examine the effectiveness of the framework in assisting physicians.
Results: The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance, with a diagnostic accuracy of 97.8%, diagnostic completeness of 93.9%, treatment plan accuracy of 95.2%, and treatment plan completeness of 98.0%, significantly surpassing the capabilities of single-modal LLM-based diagnostic tools. Experts evaluating the framework commended the completeness, relevance, and logical coherence of its outputs. However, the collaborative multimodal LLM approach resulted in increased input and output token counts, leading to higher computational costs and extended diagnostic times.
Conclusion: The framework achieves successful integration of multimodal diagnostic data, demonstrating enhanced performance enabled by multimodal LLM collaboration, which opens new horizons for the clinical application of artificial intelligence-assisted technology.
期刊介绍:
The primary aims of the WJG are to improve diagnostic, therapeutic and preventive modalities and the skills of clinicians and to guide clinical practice in gastroenterology and hepatology.