用chatgpt - 40授权放射科医生:心脏病例中大型语言模型和放射科医生的比较评估。

IF 1.9 4区 医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Turay Cesur, Yasin Celal Gunes, Eren Camur, Mustafa Dağli
{"title":"用chatgpt - 40授权放射科医生:心脏病例中大型语言模型和放射科医生的比较评估。","authors":"Turay Cesur, Yasin Celal Gunes, Eren Camur, Mustafa Dağli","doi":"10.1097/RTI.0000000000000846","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study evaluated the diagnostic accuracy and differential diagnostic capabilities of 12 Large Language Models (LLMs), one cardiac radiologist, and 3 general radiologists in cardiac radiology. The impact of the ChatGPT-4o assistance on radiologist performance was also investigated.</p><p><strong>Materials and methods: </strong>We collected publicly available 80 \"Cardiac Case of the Month\" from the Society of Thoracic Radiology website. LLMs and Radiologist-III were provided with text-based information, whereas other radiologists visually assessed the cases with and without the ChatGPT-4o assistance. Diagnostic accuracy and differential diagnosis scores (DDx scores) were analyzed using the χ2, Kruskal-Wallis, Wilcoxon, McNemar, and Mann-Whitney U tests.</p><p><strong>Results: </strong>The unassisted diagnostic accuracy of the cardiac radiologist was 72.5%, general radiologist-I was 53.8%, and general radiologist-II was 51.3%. With ChatGPT-4o, the accuracy improved to 78.8%, 70.0%, and 63.8%, respectively. The improvements for general radiologists-I and II were statistically significant (P≤0.006). All radiologists' DDx scores improved significantly with ChatGPT-4o assistance (P≤0.05). Remarkably, Radiologist-I's GPT-4o-assisted diagnostic accuracy and DDx score were not significantly different from the Cardiac Radiologist's unassisted performance (P>0.05).Among the LLMs, Claude 3 Opus and Claude 3.5 Sonnet had the highest accuracy (81.3%), followed by Claude 3 Sonnet (70.0%). Regarding the DDx score, Claude 3 Opus outperformed all models and radiologist-III (P<0.05). The accuracy of the general radiologist-III significantly improved from 48.8% to 63.8% with GPT4o assistance (P<0.001).</p><p><strong>Conclusions: </strong>ChatGPT-4o may enhance the diagnostic performance of general radiologists in cardiac imaging, suggesting its potential as a diagnostic support tool. Further studies are required to assess the clinical integration.</p>","PeriodicalId":49974,"journal":{"name":"Journal of Thoracic Imaging","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Empowering Radiologists With ChatGPT-4o: Comparative Evaluation of Large Language Models and Radiologists in Cardiac Cases.\",\"authors\":\"Turay Cesur, Yasin Celal Gunes, Eren Camur, Mustafa Dağli\",\"doi\":\"10.1097/RTI.0000000000000846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>This study evaluated the diagnostic accuracy and differential diagnostic capabilities of 12 Large Language Models (LLMs), one cardiac radiologist, and 3 general radiologists in cardiac radiology. The impact of the ChatGPT-4o assistance on radiologist performance was also investigated.</p><p><strong>Materials and methods: </strong>We collected publicly available 80 \\\"Cardiac Case of the Month\\\" from the Society of Thoracic Radiology website. LLMs and Radiologist-III were provided with text-based information, whereas other radiologists visually assessed the cases with and without the ChatGPT-4o assistance. Diagnostic accuracy and differential diagnosis scores (DDx scores) were analyzed using the χ2, Kruskal-Wallis, Wilcoxon, McNemar, and Mann-Whitney U tests.</p><p><strong>Results: </strong>The unassisted diagnostic accuracy of the cardiac radiologist was 72.5%, general radiologist-I was 53.8%, and general radiologist-II was 51.3%. With ChatGPT-4o, the accuracy improved to 78.8%, 70.0%, and 63.8%, respectively. The improvements for general radiologists-I and II were statistically significant (P≤0.006). All radiologists' DDx scores improved significantly with ChatGPT-4o assistance (P≤0.05). Remarkably, Radiologist-I's GPT-4o-assisted diagnostic accuracy and DDx score were not significantly different from the Cardiac Radiologist's unassisted performance (P>0.05).Among the LLMs, Claude 3 Opus and Claude 3.5 Sonnet had the highest accuracy (81.3%), followed by Claude 3 Sonnet (70.0%). Regarding the DDx score, Claude 3 Opus outperformed all models and radiologist-III (P<0.05). The accuracy of the general radiologist-III significantly improved from 48.8% to 63.8% with GPT4o assistance (P<0.001).</p><p><strong>Conclusions: </strong>ChatGPT-4o may enhance the diagnostic performance of general radiologists in cardiac imaging, suggesting its potential as a diagnostic support tool. Further studies are required to assess the clinical integration.</p>\",\"PeriodicalId\":49974,\"journal\":{\"name\":\"Journal of Thoracic Imaging\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Thoracic Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/RTI.0000000000000846\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Thoracic Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/RTI.0000000000000846","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究评估了12名大型语言模型(LLMs)、1名心脏放射科医生和3名普通放射科医生在心脏放射科的诊断准确性和鉴别诊断能力。chatgpt - 40辅助对放射科医生表现的影响也进行了调查。材料和方法:我们从胸放射学会网站上公开收集了80例“每月心脏病例”。llm和放射科医生iii被提供基于文本的信息,而其他放射科医生在有或没有chatgpt - 40帮助的情况下对病例进行视觉评估。采用χ2、Kruskal-Wallis、Wilcoxon、McNemar和Mann-Whitney U检验分析诊断准确性和鉴别诊断评分(DDx评分)。结果:心脏放射科医师的独立诊断准确率为72.5%,普通放射科医师一级为53.8%,普通放射科医师二级为51.3%。使用chatgpt - 40,准确率分别提高到78.8%、70.0%和63.8%。普通放射科i和II科的改善有统计学意义(P≤0.006)。在chatgpt - 40辅助下,所有放射科医生的DDx评分均显著提高(P≤0.05)。值得注意的是,放射科医师i的gpt - 40辅助诊断准确性和DDx评分与心脏放射科医师的无辅助表现无显著差异(P < 0.05)。在法学硕士中,Claude 3 Opus和Claude 3.5 Sonnet准确率最高(81.3%),其次是Claude 3 Sonnet(70.0%)。关于DDx评分,Claude 3 Opus优于所有模型和放射科医生- iii (p结论:chatgpt - 40可提高普通放射科医生在心脏成像中的诊断性能,提示其作为诊断支持工具的潜力。需要进一步的研究来评估临床整合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Empowering Radiologists With ChatGPT-4o: Comparative Evaluation of Large Language Models and Radiologists in Cardiac Cases.

Purpose: This study evaluated the diagnostic accuracy and differential diagnostic capabilities of 12 Large Language Models (LLMs), one cardiac radiologist, and 3 general radiologists in cardiac radiology. The impact of the ChatGPT-4o assistance on radiologist performance was also investigated.

Materials and methods: We collected publicly available 80 "Cardiac Case of the Month" from the Society of Thoracic Radiology website. LLMs and Radiologist-III were provided with text-based information, whereas other radiologists visually assessed the cases with and without the ChatGPT-4o assistance. Diagnostic accuracy and differential diagnosis scores (DDx scores) were analyzed using the χ2, Kruskal-Wallis, Wilcoxon, McNemar, and Mann-Whitney U tests.

Results: The unassisted diagnostic accuracy of the cardiac radiologist was 72.5%, general radiologist-I was 53.8%, and general radiologist-II was 51.3%. With ChatGPT-4o, the accuracy improved to 78.8%, 70.0%, and 63.8%, respectively. The improvements for general radiologists-I and II were statistically significant (P≤0.006). All radiologists' DDx scores improved significantly with ChatGPT-4o assistance (P≤0.05). Remarkably, Radiologist-I's GPT-4o-assisted diagnostic accuracy and DDx score were not significantly different from the Cardiac Radiologist's unassisted performance (P>0.05).Among the LLMs, Claude 3 Opus and Claude 3.5 Sonnet had the highest accuracy (81.3%), followed by Claude 3 Sonnet (70.0%). Regarding the DDx score, Claude 3 Opus outperformed all models and radiologist-III (P<0.05). The accuracy of the general radiologist-III significantly improved from 48.8% to 63.8% with GPT4o assistance (P<0.001).

Conclusions: ChatGPT-4o may enhance the diagnostic performance of general radiologists in cardiac imaging, suggesting its potential as a diagnostic support tool. Further studies are required to assess the clinical integration.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Thoracic Imaging
Journal of Thoracic Imaging 医学-核医学
CiteScore
7.10
自引率
9.10%
发文量
87
审稿时长
6-12 weeks
期刊介绍: Journal of Thoracic Imaging (JTI) provides authoritative information on all aspects of the use of imaging techniques in the diagnosis of cardiac and pulmonary diseases. Original articles and analytical reviews published in this timely journal provide the very latest thinking of leading experts concerning the use of chest radiography, computed tomography, magnetic resonance imaging, positron emission tomography, ultrasound, and all other promising imaging techniques in cardiopulmonary radiology. Official Journal of the Society of Thoracic Radiology: Japanese Society of Thoracic Radiology Korean Society of Thoracic Radiology European Society of Thoracic Imaging.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信