Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology.

IF 2.9 2区医学 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Quantitative Imaging in Medicine and Surgery Pub Date : 2024-10-01 Epub Date: 2024-09-23 DOI:10.21037/qims-24-200

Vera Sorin, Eyal Klang, Tamer Sobeh, Eli Konen, Shai Shrot, Adva Livne, Yulian Weissbuch, Chen Hoffmann, Yiftach Barash

{"title":"Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology.","authors":"Vera Sorin, Eyal Klang, Tamer Sobeh, Eli Konen, Shai Shrot, Adva Livne, Yulian Weissbuch, Chen Hoffmann, Yiftach Barash","doi":"10.21037/qims-24-200","DOIUrl":null,"url":null,"abstract":"Background: Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists' workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists.Methods: Sixty neuroradiology reports with variable diagnoses were inserted into GPT-4, which was tasked with generating a top-3 differential diagnosis for each case. The results were compared to the true diagnoses and to the differential diagnoses provided by three blinded neuroradiologists. Diagnostic accuracy and agreement between readers were assessed.Results: Of the 60 patients (mean age 47.8 years, 65% female), GPT-4 correctly included the diagnoses in its differentials in 61.7% (37/60) of cases, while the neuroradiologists' accuracy ranged from 63.3% (38/60) to 73.3% (44/60). Agreement between GPT-4 and the neuroradiologists, and among the neuroradiologists was fair to moderate [Cohen's kappa (kw) 0.34-0.44 and kw 0.39-0.54, respectively].Conclusions: GPT-4 shows potential as a support tool for differential diagnosis in neuroradiology, though it was outperformed by human experts. Radiologists should remain mindful to the limitations of LLMs, while harboring their potential to enhance educational and clinical work.","PeriodicalId":54267,"journal":{"name":"Quantitative Imaging in Medicine and Surgery","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11485343/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Imaging in Medicine and Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/qims-24-200","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/23 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists' workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists.

Methods: Sixty neuroradiology reports with variable diagnoses were inserted into GPT-4, which was tasked with generating a top-3 differential diagnosis for each case. The results were compared to the true diagnoses and to the differential diagnoses provided by three blinded neuroradiologists. Diagnostic accuracy and agreement between readers were assessed.

Results: Of the 60 patients (mean age 47.8 years, 65% female), GPT-4 correctly included the diagnoses in its differentials in 61.7% (37/60) of cases, while the neuroradiologists' accuracy ranged from 63.3% (38/60) to 73.3% (44/60). Agreement between GPT-4 and the neuroradiologists, and among the neuroradiologists was fair to moderate [Cohen's kappa (kw) 0.34-0.44 and kw 0.39-0.54, respectively].

Conclusions: GPT-4 shows potential as a support tool for differential diagnosis in neuroradiology, though it was outperformed by human experts. Radiologists should remain mindful to the limitations of LLMs, while harboring their potential to enhance educational and clinical work.

查看原文本刊更多论文

生成式预训练变换器（GPT）-4 支持神经放射学的鉴别诊断。

背景：放射学中的鉴别诊断依赖于对成像模式的准确识别。大型语言模型（LLM）在放射学中的应用前景广阔，其许多潜在应用可提高放射科医生工作流程的效率。本研究旨在评估生成式预训练转换器（GPT）-4（一种 LLM）在神经放射学中提供鉴别诊断的功效，并将其性能与经委员会认证的神经放射科医生进行比较：将 60 份诊断不一的神经放射学报告插入 GPT-4，GPT-4 的任务是为每个病例生成前 3 位的鉴别诊断。将结果与真实诊断和三位盲神经放射学专家提供的鉴别诊断进行比较。结果：在 60 名患者（平均年龄 47.8 岁，65% 为女性）中，61.7%（37/60）的 GPT-4 诊断正确，而神经放射科医生的准确率为 63.3%（38/60）至 73.3%（44/60）。GPT-4与神经放射科医生之间以及神经放射科医生之间的一致性为一般到中等[科恩卡帕（kw）分别为0.34-0.44和0.39-0.54]：结论：GPT-4 显示出作为神经放射学鉴别诊断辅助工具的潜力，尽管其表现优于人类专家。放射科医生应继续注意 LLMs 的局限性，同时挖掘其潜力，加强教育和临床工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Quantitative Imaging in Medicine and Surgery Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

4.20

自引率

17.90%

发文量

252

期刊介绍： Information not localized