DeepGut: A collaborative multimodal large language model framework for digestive disease assisted diagnosis and treatment.

IF 5.4 3区医学 Q1 GASTROENTEROLOGY & HEPATOLOGY

World Journal of Gastroenterology Pub Date : 2025-08-21 DOI:10.3748/wjg.v31.i31.109948

Xiao-Han Wan, Mei-Xia Liu, Yan Zhang, Guan-Jun Kou, Lei-Qi Xu, Han Liu, Xiao-Yun Yang, Xiu-Li Zuo, Yan-Qing Li

{"title":"DeepGut: A collaborative multimodal large language model framework for digestive disease assisted diagnosis and treatment.","authors":"Xiao-Han Wan, Mei-Xia Liu, Yan Zhang, Guan-Jun Kou, Lei-Qi Xu, Han Liu, Xiao-Yun Yang, Xiu-Li Zuo, Yan-Qing Li","doi":"10.3748/wjg.v31.i31.109948","DOIUrl":null,"url":null,"abstract":"Background: Gastrointestinal diseases have complex etiologies and clinical presentations. An accurate diagnosis requires physicians to integrate diverse information, including medical history, laboratory test results, and imaging findings. Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information, resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.Aim: To develop and evaluate a collaborative multimodal large language model (LLM) framework for clinical decision-making in digestive diseases.Methods: In this observational study, DeepGut, a multimodal LLM collaborative diagnostic framework, was developed to integrate four distinct large models into a four-tiered structure. The framework sequentially accomplishes multimodal information extraction, logical \"chain\" construction, diagnostic and treatment suggestion generation, and risk analysis. The model was evaluated using objective metrics, which assess the reliability and comprehensiveness of model-generated results, and subjective expert opinions, which examine the effectiveness of the framework in assisting physicians.Results: The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance, with a diagnostic accuracy of 97.8%, diagnostic completeness of 93.9%, treatment plan accuracy of 95.2%, and treatment plan completeness of 98.0%, significantly surpassing the capabilities of single-modal LLM-based diagnostic tools. Experts evaluating the framework commended the completeness, relevance, and logical coherence of its outputs. However, the collaborative multimodal LLM approach resulted in increased input and output token counts, leading to higher computational costs and extended diagnostic times.Conclusion: The framework achieves successful integration of multimodal diagnostic data, demonstrating enhanced performance enabled by multimodal LLM collaboration, which opens new horizons for the clinical application of artificial intelligence-assisted technology.","PeriodicalId":23778,"journal":{"name":"World Journal of Gastroenterology","volume":"31 31","pages":"109948"},"PeriodicalIF":5.4000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400200/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Gastroenterology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3748/wjg.v31.i31.109948","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Gastrointestinal diseases have complex etiologies and clinical presentations. An accurate diagnosis requires physicians to integrate diverse information, including medical history, laboratory test results, and imaging findings. Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information, resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.

Aim: To develop and evaluate a collaborative multimodal large language model (LLM) framework for clinical decision-making in digestive diseases.

Methods: In this observational study, DeepGut, a multimodal LLM collaborative diagnostic framework, was developed to integrate four distinct large models into a four-tiered structure. The framework sequentially accomplishes multimodal information extraction, logical "chain" construction, diagnostic and treatment suggestion generation, and risk analysis. The model was evaluated using objective metrics, which assess the reliability and comprehensiveness of model-generated results, and subjective expert opinions, which examine the effectiveness of the framework in assisting physicians.

Results: The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance, with a diagnostic accuracy of 97.8%, diagnostic completeness of 93.9%, treatment plan accuracy of 95.2%, and treatment plan completeness of 98.0%, significantly surpassing the capabilities of single-modal LLM-based diagnostic tools. Experts evaluating the framework commended the completeness, relevance, and logical coherence of its outputs. However, the collaborative multimodal LLM approach resulted in increased input and output token counts, leading to higher computational costs and extended diagnostic times.

Conclusion: The framework achieves successful integration of multimodal diagnostic data, demonstrating enhanced performance enabled by multimodal LLM collaboration, which opens new horizons for the clinical application of artificial intelligence-assisted technology.

Abstract Image

查看原文本刊更多论文

DeepGut：用于消化疾病辅助诊断和治疗的协作多模态大语言模型框架。

背景：胃肠道疾病具有复杂的病因和临床表现。准确的诊断需要医生综合各种信息，包括病史、实验室检查结果和影像学发现。现有的人工智能辅助诊断工具仅限于单模态信息，导致建议往往不完整，并可能与临床或法律风险相关。目的：开发和评估用于消化系统疾病临床决策的协同多模态大语言模型（LLM）框架。方法：在这项观察性研究中，开发了一个多模态LLM协作诊断框架DeepGut，将四个不同的大型模型整合到一个四层结构中。该框架依次完成多模态信息提取、逻辑“链”构建、诊疗建议生成、风险分析。使用客观指标评估模型，评估模型生成结果的可靠性和全面性，以及主观专家意见，检查框架在协助医生方面的有效性。结果：DeepGut框架生成的诊疗建议取得了优异的表现，诊断准确率为97.8%，诊断完整性为93.9%，治疗计划准确性为95.2%，治疗计划完整性为98.0%，明显超过了基于llm的单模态诊断工具的能力。评价该框架的专家赞扬其产出的完整性、相关性和逻辑连贯性。然而，协作式多模态LLM方法导致输入和输出令牌数量增加，从而导致更高的计算成本和延长的诊断时间。结论：该框架实现了多模态诊断数据的成功整合，展示了多模态LLM协作带来的性能提升，为人工智能辅助技术的临床应用开辟了新的视野。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

World Journal of Gastroenterology 医学-胃肠肝病学

CiteScore

7.80

自引率

4.70%

发文量

464

审稿时长

2.4 months

期刊介绍： The primary aims of the WJG are to improve diagnostic, therapeutic and preventive modalities and the skills of clinicians and to guide clinical practice in gastroenterology and hepatology.