Evaluating the performance of ChatGPT in clinical multidisciplinary treatment: a retrospective study.

IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS
Xueqi Wang, Jianhua Guo, Tao Zhang, Huajun Lu, Dandan Zhou, Haitao Zhang, Xuebin Wang
{"title":"Evaluating the performance of ChatGPT in clinical multidisciplinary treatment: a retrospective study.","authors":"Xueqi Wang, Jianhua Guo, Tao Zhang, Huajun Lu, Dandan Zhou, Haitao Zhang, Xuebin Wang","doi":"10.1186/s12911-025-03181-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Multidisciplinary treatment (MDT) consultations are essential for managing complex patients. However, resource and time constraints can limit their quality. Large language models (LLMs) have shown potential in assisting clinical decision-making, but their performance in complex MDT scenarios remains unclear. This study aims to evaluate the quality of MDT recommendations generated by ChatGPT compared to those provided by physicians.</p><p><strong>Methods: </strong>Clinical data from 64 patient cases were retrospectively included in the study. ChatGPT was asked to provide specific MDT recommendations. 2 experienced physicians evaluated and scored the responses in a blinded manner across 5 aspects: comprehensiveness, accuracy, feasibility, safety, and efficiency, each assessed by 2 questions.</p><p><strong>Results: </strong>The median overall score for ChatGPT was 41.0 out of 50.0, which was lower than the MDT physicians' median score of 43.5 (p = 0.001). Compared to the MDT physicians' responses, ChatGPT excelled in comprehensiveness (p < 0.001) but fell short in accuracy (p < 0.001), feasibility (p < 0.001), and efficiency (p = 0.003). Analysis of specific questions revealed that ChatGPT lacked the ability to reason through the etiologies of complex cases.</p><p><strong>Conclusion: </strong>This study indicates that ChatGPT has potential in clinical MDT applications, particularly in demonstrating more comprehensive consideration of clinical factors. However, ChatGPT still has deficiencies in accuracy, which could lead to incorrect healthcare decisions. Therefore, further development and clinical validation of LLMs are necessary. Recognizing the current limitations of LLMs, it is essential to use them with caution in clinical practice.</p><p><strong>Trial registration: </strong>Not applicable to the present retrospective study. For transparency, a related prospective extension is registered at ChiCTR (ChiCTR2400088563; registered on 21 August 2024).</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"340"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465737/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03181-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Multidisciplinary treatment (MDT) consultations are essential for managing complex patients. However, resource and time constraints can limit their quality. Large language models (LLMs) have shown potential in assisting clinical decision-making, but their performance in complex MDT scenarios remains unclear. This study aims to evaluate the quality of MDT recommendations generated by ChatGPT compared to those provided by physicians.

Methods: Clinical data from 64 patient cases were retrospectively included in the study. ChatGPT was asked to provide specific MDT recommendations. 2 experienced physicians evaluated and scored the responses in a blinded manner across 5 aspects: comprehensiveness, accuracy, feasibility, safety, and efficiency, each assessed by 2 questions.

Results: The median overall score for ChatGPT was 41.0 out of 50.0, which was lower than the MDT physicians' median score of 43.5 (p = 0.001). Compared to the MDT physicians' responses, ChatGPT excelled in comprehensiveness (p < 0.001) but fell short in accuracy (p < 0.001), feasibility (p < 0.001), and efficiency (p = 0.003). Analysis of specific questions revealed that ChatGPT lacked the ability to reason through the etiologies of complex cases.

Conclusion: This study indicates that ChatGPT has potential in clinical MDT applications, particularly in demonstrating more comprehensive consideration of clinical factors. However, ChatGPT still has deficiencies in accuracy, which could lead to incorrect healthcare decisions. Therefore, further development and clinical validation of LLMs are necessary. Recognizing the current limitations of LLMs, it is essential to use them with caution in clinical practice.

Trial registration: Not applicable to the present retrospective study. For transparency, a related prospective extension is registered at ChiCTR (ChiCTR2400088563; registered on 21 August 2024).

Abstract Image

Abstract Image

Abstract Image

评价ChatGPT在临床多学科治疗中的表现:一项回顾性研究。
背景:多学科治疗(MDT)会诊是必要的管理复杂的病人。然而,资源和时间的限制会限制它们的质量。大型语言模型(llm)在辅助临床决策方面显示出潜力,但它们在复杂MDT场景中的表现尚不清楚。本研究旨在评估ChatGPT生成的MDT建议与医生提供的建议的质量。方法:回顾性分析64例患者的临床资料。ChatGPT被要求提供具体的MDT建议。2名经验丰富的医师采用盲法从全面性、准确性、可行性、安全性、有效性5个方面对问卷进行评价和评分,每个方面分为2个问题。结果:ChatGPT的中位总分为41.0分(总分50.0),低于MDT医生的中位总分43.5分(p = 0.001)。与MDT医生的反应相比,ChatGPT的全面性更强(p)。结论:本研究表明ChatGPT在MDT临床应用中具有潜力,特别是在更全面地考虑临床因素方面。然而,ChatGPT在准确性方面仍然存在不足,这可能导致不正确的医疗保健决策。因此,llm的进一步开发和临床验证是必要的。认识到llm目前的局限性,在临床实践中谨慎使用它们是必不可少的。试验注册:不适用于本回顾性研究。为透明起见,相关的预期延期已在ChiCTR注册(ChiCTR2400088563,于2024年8月21日注册)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信