Can ChatGPT4-vision identify radiologic progression of multiple sclerosis on brain MRI?

IF 3.7 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Brendan S Kelly, Sophie Duignan, Prateek Mathur, Henry Dillon, Edward H Lee, Kristen W Yeom, Pearse A Keane, Aonghus Lawlor, Ronan P Killeen
{"title":"Can ChatGPT4-vision identify radiologic progression of multiple sclerosis on brain MRI?","authors":"Brendan S Kelly, Sophie Duignan, Prateek Mathur, Henry Dillon, Edward H Lee, Kristen W Yeom, Pearse A Keane, Aonghus Lawlor, Ronan P Killeen","doi":"10.1186/s41747-024-00547-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The large language model ChatGPT can now accept image input with the GPT4-vision (GPT4V) version. We aimed to compare the performance of GPT4V to pretrained U-Net and vision transformer (ViT) models for the identification of the progression of multiple sclerosis (MS) on magnetic resonance imaging (MRI).</p><p><strong>Methods: </strong>Paired coregistered MR images with and without progression were provided as input to ChatGPT4V in a zero-shot experiment to identify radiologic progression. Its performance was compared to pretrained U-Net and ViT models. Accuracy was the primary evaluation metric and 95% confidence interval (CIs) were calculated by bootstrapping. We included 170 patients with MS (50 males, 120 females), aged 21-74 years (mean 42.3), imaged at a single institution from 2019 to 2021, each with 2-5 MRI studies (496 in total).</p><p><strong>Results: </strong>One hundred seventy patients were included, 110 for training, 30 for tuning, and 30 for testing; 100 unseen paired images were randomly selected from the test set for evaluation. Both U-Net and ViT had 94% (95% CI: 89-98%) accuracy while GPT4V had 85% (77-91%). GPT4V gave cautious nonanswers in six cases. GPT4V had precision (specificity), recall (sensitivity), and F1 score of 89% (75-93%), 92% (82-98%), 91 (82-97%) compared to 100% (100-100%), 88 (78-96%), and 0.94 (88-98%) for U-Net and 94% (87-100%), 94 (88-100%), and 94 (89-98%) for ViT.</p><p><strong>Conclusion: </strong>The performance of GPT4V combined with its accessibility suggests has the potential to impact AI radiology research. However, misclassified cases and overly cautious non-answers confirm that it is not yet ready for clinical use.</p><p><strong>Relevance statement: </strong>GPT4V can identify the radiologic progression of MS in a simplified experimental setting. However, GPT4V is not a medical device, and its widespread availability highlights the need for caution and education for lay users, especially those with limited access to expert healthcare.</p><p><strong>Key points: </strong>Without fine-tuning or the need for prior coding experience, GPT4V can perform a zero-shot radiologic change detection task with reasonable accuracy. However, in absolute terms, in a simplified \"spot the difference\" medical imaging task, GPT4V was inferior to state-of-the-art computer vision methods. GPT4V's performance metrics were more similar to the ViT than the U-net. This is an exploratory experimental study and GPT4V is not intended for use as a medical device.</p>","PeriodicalId":36926,"journal":{"name":"European Radiology Experimental","volume":"9 1","pages":"9"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735712/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology Experimental","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41747-024-00547-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The large language model ChatGPT can now accept image input with the GPT4-vision (GPT4V) version. We aimed to compare the performance of GPT4V to pretrained U-Net and vision transformer (ViT) models for the identification of the progression of multiple sclerosis (MS) on magnetic resonance imaging (MRI).

Methods: Paired coregistered MR images with and without progression were provided as input to ChatGPT4V in a zero-shot experiment to identify radiologic progression. Its performance was compared to pretrained U-Net and ViT models. Accuracy was the primary evaluation metric and 95% confidence interval (CIs) were calculated by bootstrapping. We included 170 patients with MS (50 males, 120 females), aged 21-74 years (mean 42.3), imaged at a single institution from 2019 to 2021, each with 2-5 MRI studies (496 in total).

Results: One hundred seventy patients were included, 110 for training, 30 for tuning, and 30 for testing; 100 unseen paired images were randomly selected from the test set for evaluation. Both U-Net and ViT had 94% (95% CI: 89-98%) accuracy while GPT4V had 85% (77-91%). GPT4V gave cautious nonanswers in six cases. GPT4V had precision (specificity), recall (sensitivity), and F1 score of 89% (75-93%), 92% (82-98%), 91 (82-97%) compared to 100% (100-100%), 88 (78-96%), and 0.94 (88-98%) for U-Net and 94% (87-100%), 94 (88-100%), and 94 (89-98%) for ViT.

Conclusion: The performance of GPT4V combined with its accessibility suggests has the potential to impact AI radiology research. However, misclassified cases and overly cautious non-answers confirm that it is not yet ready for clinical use.

Relevance statement: GPT4V can identify the radiologic progression of MS in a simplified experimental setting. However, GPT4V is not a medical device, and its widespread availability highlights the need for caution and education for lay users, especially those with limited access to expert healthcare.

Key points: Without fine-tuning or the need for prior coding experience, GPT4V can perform a zero-shot radiologic change detection task with reasonable accuracy. However, in absolute terms, in a simplified "spot the difference" medical imaging task, GPT4V was inferior to state-of-the-art computer vision methods. GPT4V's performance metrics were more similar to the ViT than the U-net. This is an exploratory experimental study and GPT4V is not intended for use as a medical device.

ChatGPT4-vision能否在脑MRI上识别多发性硬化症的影像学进展?
背景:大型语言模型ChatGPT现在可以接受GPT4-vision (GPT4V)版本的图像输入。我们的目的是比较GPT4V与预训练U-Net和视觉变压器(ViT)模型在磁共振成像(MRI)上识别多发性硬化症(MS)进展的性能。方法:在零射击实验中,将有进展和无进展的配对共配MR图像作为ChatGPT4V的输入,以识别放射学进展。将其性能与预训练的U-Net和ViT模型进行了比较。准确度为主要评价指标,95%置信区间(ci)采用自举法计算。我们纳入了170例MS患者(男性50例,女性120例),年龄21-74岁(平均42.3岁),于2019年至2021年在同一家机构进行了影像学检查,每人进行了2-5次MRI研究(共496次)。结果:纳入170例患者,110例用于训练,30例用于调整,30例用于测试;从测试集中随机选择100张未见的成对图像进行评估。U-Net和ViT准确率均为94% (95% CI: 89-98%),而GPT4V准确率为85%(77-91%)。GPT4V在6个案例中给出了谨慎的不回答。GPT4V的精密度(特异性)、召回率(敏感性)和F1评分分别为89%(75-93%)、92%(82-98%)、91 (82-97%),U-Net为100%(100-100%)、88(78-96%)和0.94 (88-98%),ViT为94%(87-100%)、94(88-100%)和94(89-98%)。结论:GPT4V的性能及其可及性提示其具有影响人工智能放射学研究的潜力。然而,错误分类的病例和过于谨慎的不回答证实,它还没有准备好临床应用。相关性声明:GPT4V可以在简化的实验环境中识别MS的放射学进展。然而,GPT4V不是一种医疗设备,它的广泛可用性突出了对非专业用户的谨慎和教育的必要性,特别是那些获得专家医疗保健的机会有限的用户。重点:GPT4V无需微调,无需事先编码经验,可以以合理的精度完成零射击放射学变化检测任务。然而,从绝对意义上讲,在简化的“发现差异”医学成像任务中,GPT4V不如最先进的计算机视觉方法。GPT4V的性能指标更类似于ViT而不是U-net。这是一项探索性实验研究,GPT4V不打算用作医疗设备。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
European Radiology Experimental
European Radiology Experimental Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
6.70
自引率
2.60%
发文量
56
审稿时长
18 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信