Comparative analysis of GPT-4-based ChatGPT's diagnostic performance with radiologists using real-world radiology reports of brain tumors.

IF 4.7 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

European Radiology Pub Date : 2025-04-01 Epub Date: 2024-08-28 DOI:10.1007/s00330-024-11032-8

Yasuhito Mitsuyama, Hiroyuki Tatekawa, Hirotaka Takita, Fumi Sasaki, Akane Tashiro, Satoshi Oue, Shannon L Walston, Yuta Nonomiya, Ayumi Shintani, Yukio Miki, Daiju Ueda

{"title":"Comparative analysis of GPT-4-based ChatGPT's diagnostic performance with radiologists using real-world radiology reports of brain tumors.","authors":"Yasuhito Mitsuyama, Hiroyuki Tatekawa, Hirotaka Takita, Fumi Sasaki, Akane Tashiro, Satoshi Oue, Shannon L Walston, Yuta Nonomiya, Ayumi Shintani, Yukio Miki, Daiju Ueda","doi":"10.1007/s00330-024-11032-8","DOIUrl":null,"url":null,"abstract":"Objectives: Large language models like GPT-4 have demonstrated potential for diagnosis in radiology. Previous studies investigating this potential primarily utilized quizzes from academic journals. This study aimed to assess the diagnostic capabilities of GPT-4-based Chat Generative Pre-trained Transformer (ChatGPT) using actual clinical radiology reports of brain tumors and compare its performance with that of neuroradiologists and general radiologists.Methods: We collected brain MRI reports written in Japanese from preoperative brain tumor patients at two institutions from January 2017 to December 2021. The MRI reports were translated into English by radiologists. GPT-4 and five radiologists were presented with the same textual findings from the reports and asked to suggest differential and final diagnoses. The pathological diagnosis of the excised tumor served as the ground truth. McNemar's test and Fisher's exact test were used for statistical analysis.Results: In a study analyzing 150 radiological reports, GPT-4 achieved a final diagnostic accuracy of 73%, while radiologists' accuracy ranged from 65 to 79%. GPT-4's final diagnostic accuracy using reports from neuroradiologists was higher at 80%, compared to 60% using those from general radiologists. In the realm of differential diagnoses, GPT-4's accuracy was 94%, while radiologists' fell between 73 and 89%. Notably, for these differential diagnoses, GPT-4's accuracy remained consistent whether reports were from neuroradiologists or general radiologists.Conclusion: GPT-4 exhibited good diagnostic capability, comparable to neuroradiologists in differentiating brain tumors from MRI reports. GPT-4 can be a second opinion for neuroradiologists on final diagnoses and a guidance tool for general radiologists and residents.Clinical relevance statement: This study evaluated GPT-4-based ChatGPT's diagnostic capabilities using real-world clinical MRI reports from brain tumor cases, revealing that its accuracy in interpreting brain tumors from MRI findings is competitive with radiologists.Key points: We investigated the diagnostic accuracy of GPT-4 using real-world clinical MRI reports of brain tumors. GPT-4 achieved final and differential diagnostic accuracy that is comparable with neuroradiologists. GPT-4 has the potential to improve the diagnostic process in clinical radiology.","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":"1938-1947"},"PeriodicalIF":4.7000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11913992/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11032-8","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/28 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: Large language models like GPT-4 have demonstrated potential for diagnosis in radiology. Previous studies investigating this potential primarily utilized quizzes from academic journals. This study aimed to assess the diagnostic capabilities of GPT-4-based Chat Generative Pre-trained Transformer (ChatGPT) using actual clinical radiology reports of brain tumors and compare its performance with that of neuroradiologists and general radiologists.

Methods: We collected brain MRI reports written in Japanese from preoperative brain tumor patients at two institutions from January 2017 to December 2021. The MRI reports were translated into English by radiologists. GPT-4 and five radiologists were presented with the same textual findings from the reports and asked to suggest differential and final diagnoses. The pathological diagnosis of the excised tumor served as the ground truth. McNemar's test and Fisher's exact test were used for statistical analysis.

Results: In a study analyzing 150 radiological reports, GPT-4 achieved a final diagnostic accuracy of 73%, while radiologists' accuracy ranged from 65 to 79%. GPT-4's final diagnostic accuracy using reports from neuroradiologists was higher at 80%, compared to 60% using those from general radiologists. In the realm of differential diagnoses, GPT-4's accuracy was 94%, while radiologists' fell between 73 and 89%. Notably, for these differential diagnoses, GPT-4's accuracy remained consistent whether reports were from neuroradiologists or general radiologists.

Conclusion: GPT-4 exhibited good diagnostic capability, comparable to neuroradiologists in differentiating brain tumors from MRI reports. GPT-4 can be a second opinion for neuroradiologists on final diagnoses and a guidance tool for general radiologists and residents.

Clinical relevance statement: This study evaluated GPT-4-based ChatGPT's diagnostic capabilities using real-world clinical MRI reports from brain tumor cases, revealing that its accuracy in interpreting brain tumors from MRI findings is competitive with radiologists.

Key points: We investigated the diagnostic accuracy of GPT-4 using real-world clinical MRI reports of brain tumors. GPT-4 achieved final and differential diagnostic accuracy that is comparable with neuroradiologists. GPT-4 has the potential to improve the diagnostic process in clinical radiology.

Abstract Image

查看原文本刊更多论文

利用真实世界的脑肿瘤放射学报告，对基于 GPT-4 的 ChatGPT 与放射科医生的诊断性能进行比较分析。

目的：像 GPT-4 这样的大型语言模型已被证明具有放射学诊断的潜力。以前调查这种潜力的研究主要利用学术期刊上的测验。本研究旨在使用实际的脑肿瘤临床放射学报告评估基于 GPT-4 的聊天生成预训练转换器（ChatGPT）的诊断能力，并将其性能与神经放射科医生和普通放射科医生的性能进行比较：我们收集了 2017 年 1 月至 2021 年 12 月期间两家机构的脑肿瘤术前患者用日语撰写的脑部 MRI 报告。MRI 报告由放射科医生翻译成英文。GPT-4和五位放射科医生从报告中获得了相同的文字结果，并被要求提出鉴别诊断和最终诊断。切除肿瘤的病理诊断为基本事实。统计分析采用 McNemar 检验和费雪精确检验：结果：在一项对 150 份放射报告进行分析的研究中，GPT-4 的最终诊断准确率为 73%，而放射科医生的准确率在 65% 到 79% 之间。神经放射科医生报告的 GPT-4 最终诊断准确率为 80%，而普通放射科医生报告的准确率为 60%。在鉴别诊断方面，GPT-4 的准确率为 94%，而放射科医生的准确率在 73% 到 89% 之间。值得注意的是，对于这些鉴别诊断，无论是神经放射科医生还是普通放射科医生的报告，GPT-4 的准确率都保持一致：结论：GPT-4 具有良好的诊断能力，在从磁共振成像报告中鉴别脑肿瘤方面与神经放射科医生不相上下。GPT-4 可以作为神经放射科医生最终诊断的第二意见，也可以作为普通放射科医生和住院医生的指导工具：本研究利用真实世界中脑肿瘤病例的临床 MRI 报告评估了基于 GPT-4 的 ChatGPT 的诊断能力，结果表明其根据 MRI 检查结果判读脑肿瘤的准确性可与放射科医生媲美：我们利用真实世界的脑肿瘤临床 MRI 报告研究了 GPT-4 的诊断准确性。GPT-4的最终诊断和鉴别诊断准确率可与神经放射科医生媲美。GPT-4 具有改善临床放射学诊断过程的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Radiology 医学-核医学

CiteScore

11.60

自引率

8.50%

发文量

874

审稿时长

2-4 weeks

期刊介绍： European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field. This is the Journal of the European Society of Radiology, and the official journal of a number of societies. From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.