Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology.

IF 2.9 2区 医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Quantitative Imaging in Medicine and Surgery Pub Date : 2024-10-01 Epub Date: 2024-09-23 DOI:10.21037/qims-24-200
Vera Sorin, Eyal Klang, Tamer Sobeh, Eli Konen, Shai Shrot, Adva Livne, Yulian Weissbuch, Chen Hoffmann, Yiftach Barash
{"title":"Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology.","authors":"Vera Sorin, Eyal Klang, Tamer Sobeh, Eli Konen, Shai Shrot, Adva Livne, Yulian Weissbuch, Chen Hoffmann, Yiftach Barash","doi":"10.21037/qims-24-200","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists' workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists.</p><p><strong>Methods: </strong>Sixty neuroradiology reports with variable diagnoses were inserted into GPT-4, which was tasked with generating a top-3 differential diagnosis for each case. The results were compared to the true diagnoses and to the differential diagnoses provided by three blinded neuroradiologists. Diagnostic accuracy and agreement between readers were assessed.</p><p><strong>Results: </strong>Of the 60 patients (mean age 47.8 years, 65% female), GPT-4 correctly included the diagnoses in its differentials in 61.7% (37/60) of cases, while the neuroradiologists' accuracy ranged from 63.3% (38/60) to 73.3% (44/60). Agreement between GPT-4 and the neuroradiologists, and among the neuroradiologists was fair to moderate [Cohen's kappa (kw) 0.34-0.44 and kw 0.39-0.54, respectively].</p><p><strong>Conclusions: </strong>GPT-4 shows potential as a support tool for differential diagnosis in neuroradiology, though it was outperformed by human experts. Radiologists should remain mindful to the limitations of LLMs, while harboring their potential to enhance educational and clinical work.</p>","PeriodicalId":54267,"journal":{"name":"Quantitative Imaging in Medicine and Surgery","volume":"14 10","pages":"7551-7560"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11485343/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Imaging in Medicine and Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/qims-24-200","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/23 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists' workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists.

Methods: Sixty neuroradiology reports with variable diagnoses were inserted into GPT-4, which was tasked with generating a top-3 differential diagnosis for each case. The results were compared to the true diagnoses and to the differential diagnoses provided by three blinded neuroradiologists. Diagnostic accuracy and agreement between readers were assessed.

Results: Of the 60 patients (mean age 47.8 years, 65% female), GPT-4 correctly included the diagnoses in its differentials in 61.7% (37/60) of cases, while the neuroradiologists' accuracy ranged from 63.3% (38/60) to 73.3% (44/60). Agreement between GPT-4 and the neuroradiologists, and among the neuroradiologists was fair to moderate [Cohen's kappa (kw) 0.34-0.44 and kw 0.39-0.54, respectively].

Conclusions: GPT-4 shows potential as a support tool for differential diagnosis in neuroradiology, though it was outperformed by human experts. Radiologists should remain mindful to the limitations of LLMs, while harboring their potential to enhance educational and clinical work.

生成式预训练变换器(GPT)-4 支持神经放射学的鉴别诊断。
背景:放射学中的鉴别诊断依赖于对成像模式的准确识别。大型语言模型(LLM)在放射学中的应用前景广阔,其许多潜在应用可提高放射科医生工作流程的效率。本研究旨在评估生成式预训练转换器(GPT)-4(一种 LLM)在神经放射学中提供鉴别诊断的功效,并将其性能与经委员会认证的神经放射科医生进行比较:将 60 份诊断不一的神经放射学报告插入 GPT-4,GPT-4 的任务是为每个病例生成前 3 位的鉴别诊断。将结果与真实诊断和三位盲神经放射学专家提供的鉴别诊断进行比较。结果:在 60 名患者(平均年龄 47.8 岁,65% 为女性)中,61.7%(37/60)的 GPT-4 诊断正确,而神经放射科医生的准确率为 63.3%(38/60)至 73.3%(44/60)。GPT-4与神经放射科医生之间以及神经放射科医生之间的一致性为一般到中等[科恩卡帕(kw)分别为0.34-0.44和0.39-0.54]:结论:GPT-4 显示出作为神经放射学鉴别诊断辅助工具的潜力,尽管其表现优于人类专家。放射科医生应继续注意 LLMs 的局限性,同时挖掘其潜力,加强教育和临床工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Quantitative Imaging in Medicine and Surgery
Quantitative Imaging in Medicine and Surgery Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
4.20
自引率
17.90%
发文量
252
期刊介绍: Information not localized
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信