Assessing GPT-4 multimodal performance in radiological image analysis.

IF 4.7 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
European Radiology Pub Date : 2025-04-01 Epub Date: 2024-08-30 DOI:10.1007/s00330-024-11035-5
Dana Brin, Vera Sorin, Yiftach Barash, Eli Konen, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang
{"title":"Assessing GPT-4 multimodal performance in radiological image analysis.","authors":"Dana Brin, Vera Sorin, Yiftach Barash, Eli Konen, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang","doi":"10.1007/s00330-024-11035-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This study aims to assess the performance of a multimodal artificial intelligence (AI) model capable of analyzing both images and textual data (GPT-4V), in interpreting radiological images. It focuses on a range of modalities, anatomical regions, and pathologies to explore the potential of zero-shot generative AI in enhancing diagnostic processes in radiology.</p><p><strong>Methods: </strong>We analyzed 230 anonymized emergency room diagnostic images, consecutively collected over 1 week, using GPT-4V. Modalities included ultrasound (US), computerized tomography (CT), and X-ray images. The interpretations provided by GPT-4V were then compared with those of senior radiologists. This comparison aimed to evaluate the accuracy of GPT-4V in recognizing the imaging modality, anatomical region, and pathology present in the images.</p><p><strong>Results: </strong>GPT-4V identified the imaging modality correctly in 100% of cases (221/221), the anatomical region in 87.1% (189/217), and the pathology in 35.2% (76/216). However, the model's performance varied significantly across different modalities, with anatomical region identification accuracy ranging from 60.9% (39/64) in US images to 97% (98/101) and 100% (52/52) in CT and X-ray images (p < 0.001). Similarly, pathology identification ranged from 9.1% (6/66) in US images to 36.4% (36/99) in CT and 66.7% (34/51) in X-ray images (p < 0.001). These variations indicate inconsistencies in GPT-4V's ability to interpret radiological images accurately.</p><p><strong>Conclusion: </strong>While the integration of AI in radiology, exemplified by multimodal GPT-4, offers promising avenues for diagnostic enhancement, the current capabilities of GPT-4V are not yet reliable for interpreting radiological images. This study underscores the necessity for ongoing development to achieve dependable performance in radiology diagnostics.</p><p><strong>Clinical relevance statement: </strong>Although GPT-4V shows promise in radiological image interpretation, its high diagnostic hallucination rate (> 40%) indicates it cannot be trusted for clinical use as a standalone tool. Improvements are necessary to enhance its reliability and ensure patient safety.</p><p><strong>Key points: </strong>GPT-4V's capability in analyzing images offers new clinical possibilities in radiology. GPT-4V excels in identifying imaging modalities but demonstrates inconsistent anatomy and pathology detection. Ongoing AI advancements are necessary to enhance diagnostic reliability in radiological applications.</p>","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":"1959-1965"},"PeriodicalIF":4.7000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11914349/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-11035-5","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/30 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: This study aims to assess the performance of a multimodal artificial intelligence (AI) model capable of analyzing both images and textual data (GPT-4V), in interpreting radiological images. It focuses on a range of modalities, anatomical regions, and pathologies to explore the potential of zero-shot generative AI in enhancing diagnostic processes in radiology.

Methods: We analyzed 230 anonymized emergency room diagnostic images, consecutively collected over 1 week, using GPT-4V. Modalities included ultrasound (US), computerized tomography (CT), and X-ray images. The interpretations provided by GPT-4V were then compared with those of senior radiologists. This comparison aimed to evaluate the accuracy of GPT-4V in recognizing the imaging modality, anatomical region, and pathology present in the images.

Results: GPT-4V identified the imaging modality correctly in 100% of cases (221/221), the anatomical region in 87.1% (189/217), and the pathology in 35.2% (76/216). However, the model's performance varied significantly across different modalities, with anatomical region identification accuracy ranging from 60.9% (39/64) in US images to 97% (98/101) and 100% (52/52) in CT and X-ray images (p < 0.001). Similarly, pathology identification ranged from 9.1% (6/66) in US images to 36.4% (36/99) in CT and 66.7% (34/51) in X-ray images (p < 0.001). These variations indicate inconsistencies in GPT-4V's ability to interpret radiological images accurately.

Conclusion: While the integration of AI in radiology, exemplified by multimodal GPT-4, offers promising avenues for diagnostic enhancement, the current capabilities of GPT-4V are not yet reliable for interpreting radiological images. This study underscores the necessity for ongoing development to achieve dependable performance in radiology diagnostics.

Clinical relevance statement: Although GPT-4V shows promise in radiological image interpretation, its high diagnostic hallucination rate (> 40%) indicates it cannot be trusted for clinical use as a standalone tool. Improvements are necessary to enhance its reliability and ensure patient safety.

Key points: GPT-4V's capability in analyzing images offers new clinical possibilities in radiology. GPT-4V excels in identifying imaging modalities but demonstrates inconsistent anatomy and pathology detection. Ongoing AI advancements are necessary to enhance diagnostic reliability in radiological applications.

Abstract Image

评估 GPT-4 在放射图像分析中的多模式性能。
研究目的本研究旨在评估能够分析图像和文本数据的多模态人工智能(AI)模型(GPT-4V)在解读放射图像方面的性能。它侧重于一系列模式、解剖区域和病理,以探索零镜头生成式人工智能在增强放射学诊断过程中的潜力:我们使用 GPT-4V 分析了 230 幅匿名急诊室诊断图像,这些图像是在一周内连续收集的。图像模式包括超声波(US)、计算机断层扫描(CT)和 X 光图像。然后将 GPT-4V 提供的判读结果与资深放射科医生的判读结果进行比较。这项比较旨在评估 GPT-4V 在识别图像中的成像模式、解剖区域和病理方面的准确性:GPT-4V在100%的病例(221/221)中正确识别了成像模式,在87.1%的病例(189/217)中正确识别了解剖区域,在35.2%的病例(76/216)中正确识别了病理。然而,该模型在不同模式下的表现差异很大,解剖区域识别准确率从 US 图像的 60.9%(39/64)到 CT 和 X 光图像的 97%(98/101)和 100%(52/52)不等(p 结论):以多模态 GPT-4 为代表的人工智能在放射学中的应用为提高诊断水平提供了广阔的前景,但 GPT-4V 目前在解读放射图像方面的能力尚不可靠。这项研究强调了持续开发的必要性,以便在放射诊断中实现可靠的性能:尽管 GPT-4V 在放射图像解读方面大有可为,但其较高的诊断幻觉率(> 40%)表明它不能作为独立工具用于临床。有必要对其进行改进,以提高其可靠性并确保患者安全:GPT-4V分析图像的能力为放射学提供了新的临床可能性。GPT-4V在识别成像模式方面表现出色,但在解剖学和病理学检测方面表现出不一致性。要提高放射学应用中诊断的可靠性,就必须不断改进人工智能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
European Radiology
European Radiology 医学-核医学
CiteScore
11.60
自引率
8.50%
发文量
874
审稿时长
2-4 weeks
期刊介绍: European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field. This is the Journal of the European Society of Radiology, and the official journal of a number of societies. From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信