A retrospective evaluation of the potential of ChatGPT in the accurate diagnosis of acute stroke.

IF 1.4 4区 医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Beyza Nur Kuzan, İsmail Meşe, Servan Yaşar, Taha Yusuf Kuzan
{"title":"A retrospective evaluation of the potential of ChatGPT in the accurate diagnosis of acute stroke.","authors":"Beyza Nur Kuzan, İsmail Meşe, Servan Yaşar, Taha Yusuf Kuzan","doi":"10.4274/dir.2024.242892","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Stroke is a neurological emergency requiring rapid, accurate diagnosis to prevent severe consequences. Early diagnosis is crucial for reducing morbidity and mortality. Artificial intelligence (AI) diagnosis support tools, such as Chat Generative Pre-trained Transformer (ChatGPT), offer rapid diagnostic advantages. This study assesses ChatGPT's accuracy in interpreting diffusion-weighted imaging (DWI) for acute stroke diagnosis.</p><p><strong>Methods: </strong>A retrospective analysis was conducted to identify the presence of stroke using DWI and apparent diffusion coefficient (ADC) map images. Patients aged >18 years who exhibited diffusion restriction and had a clinically explainable condition were included in the study. Patients with artifacts that affected image homogeneity, accuracy, and clarity, as well as those who had undergone previous surgery or had a history of stroke, were excluded from the study. ChatGPT was asked four consecutive questions regarding the identification of the magnetic resonance imaging (MRI) sequence, the demonstration of diffusion restriction on the ADC map after sequence recognition, and the identification of hemispheres and specific lobes. Each question was repeated 10 times to ensure consistency. Senior radiologists subsequently verified the accuracy of ChatGPT's responses, classifying them as either correct or incorrect. We assumed a response to be incorrect if it was partially correct or suggested multiple answers. These responses were systematically recorded. We also recorded non-responses from ChatGPT-4V when it failed to provide an answer to a query. We assessed ChatGPT-4V's performance by calculating the number and percentage of correct responses, incorrect responses, and non-responses across all images and questions, a metric known as \"accuracy.\" ChatGPT-4V was considered successful if it answered ≥80% of the examples correctly.</p><p><strong>Results: </strong>A total of 530 diffusion MRI, of which 266 were stroke images and 264 were normal, were evaluated in the study. For the initial query identifying MRI sequence type, ChatGPT-4V's accuracy was 88.3% for stroke and 90.1% for normal images. For detecting diffusion restriction, ChatGPT-4V had an accuracy of 79.5% for stroke images, with a 15% false positive rate for normal images. Regarding identifying the brain or cerebellar hemisphere involved, ChatGPT-4V correctly identified the hemisphere in 26.2% of stroke images. For identifying the specific brain lobe or cerebellar area affected, ChatGPT-4V had a 20.4% accuracy for stroke images. The diagnostic sensitivity of ChatGPT-4V in acute stroke was found to be 79.57%, with a specificity of 84.87%, a positive predictive value of 83.86%, a negative predictive value of 80.80%, and a diagnostic odds ratio of 21.86.</p><p><strong>Conclusion: </strong>Despite limitations, ChatGPT shows potential as a supportive tool for healthcare professionals in interpreting diffusion examinations in stroke cases, where timely diagnosis is critical.</p><p><strong>Clinical significance: </strong>ChatGPT can play an important role in various aspects of stroke cases, such as risk assessment, early diagnosis, and treatment planning.</p>","PeriodicalId":11341,"journal":{"name":"Diagnostic and interventional radiology","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and interventional radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.4274/dir.2024.242892","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Stroke is a neurological emergency requiring rapid, accurate diagnosis to prevent severe consequences. Early diagnosis is crucial for reducing morbidity and mortality. Artificial intelligence (AI) diagnosis support tools, such as Chat Generative Pre-trained Transformer (ChatGPT), offer rapid diagnostic advantages. This study assesses ChatGPT's accuracy in interpreting diffusion-weighted imaging (DWI) for acute stroke diagnosis.

Methods: A retrospective analysis was conducted to identify the presence of stroke using DWI and apparent diffusion coefficient (ADC) map images. Patients aged >18 years who exhibited diffusion restriction and had a clinically explainable condition were included in the study. Patients with artifacts that affected image homogeneity, accuracy, and clarity, as well as those who had undergone previous surgery or had a history of stroke, were excluded from the study. ChatGPT was asked four consecutive questions regarding the identification of the magnetic resonance imaging (MRI) sequence, the demonstration of diffusion restriction on the ADC map after sequence recognition, and the identification of hemispheres and specific lobes. Each question was repeated 10 times to ensure consistency. Senior radiologists subsequently verified the accuracy of ChatGPT's responses, classifying them as either correct or incorrect. We assumed a response to be incorrect if it was partially correct or suggested multiple answers. These responses were systematically recorded. We also recorded non-responses from ChatGPT-4V when it failed to provide an answer to a query. We assessed ChatGPT-4V's performance by calculating the number and percentage of correct responses, incorrect responses, and non-responses across all images and questions, a metric known as "accuracy." ChatGPT-4V was considered successful if it answered ≥80% of the examples correctly.

Results: A total of 530 diffusion MRI, of which 266 were stroke images and 264 were normal, were evaluated in the study. For the initial query identifying MRI sequence type, ChatGPT-4V's accuracy was 88.3% for stroke and 90.1% for normal images. For detecting diffusion restriction, ChatGPT-4V had an accuracy of 79.5% for stroke images, with a 15% false positive rate for normal images. Regarding identifying the brain or cerebellar hemisphere involved, ChatGPT-4V correctly identified the hemisphere in 26.2% of stroke images. For identifying the specific brain lobe or cerebellar area affected, ChatGPT-4V had a 20.4% accuracy for stroke images. The diagnostic sensitivity of ChatGPT-4V in acute stroke was found to be 79.57%, with a specificity of 84.87%, a positive predictive value of 83.86%, a negative predictive value of 80.80%, and a diagnostic odds ratio of 21.86.

Conclusion: Despite limitations, ChatGPT shows potential as a supportive tool for healthcare professionals in interpreting diffusion examinations in stroke cases, where timely diagnosis is critical.

Clinical significance: ChatGPT can play an important role in various aspects of stroke cases, such as risk assessment, early diagnosis, and treatment planning.

回顾性评估 ChatGPT 在准确诊断急性中风方面的潜力。
目的:中风是一种神经系统急症,需要快速、准确的诊断,以防止严重后果的发生。早期诊断对于降低发病率和死亡率至关重要。人工智能(AI)诊断支持工具,如 Chat Generative Pre-trained Transformer(ChatGPT),具有快速诊断的优势。本研究评估了 ChatGPT 在急性卒中诊断中解释弥散加权成像(DWI)的准确性:方法:对使用 DWI 和表观弥散系数(ADC)图进行脑卒中诊断的患者进行回顾性分析。研究纳入了年龄大于 18 岁、表现出弥散受限且临床可解释的患者。有影响图像均匀性、准确性和清晰度的伪影的患者,以及既往接受过手术或有中风病史的患者被排除在研究之外。ChatGPT 被连续问了四个问题,涉及磁共振成像(MRI)序列的识别、序列识别后 ADC 图上弥散限制的显示以及半球和特定脑叶的识别。每个问题重复 10 次,以确保一致性。资深放射科医生随后会核实 ChatGPT 回答的准确性,并将其分为正确或错误。如果回答部分正确或提出了多个答案,我们就认为该回答不正确。我们系统地记录了这些回复。我们还记录了 ChatGPT-4V 在未能提供查询答案时的非回复。我们通过计算所有图像和问题中正确回答、错误回答和未回答的数量和百分比来评估 ChatGPT-4V 的性能,这一指标被称为 "准确性"。如果 ChatGPT-4V 能正确回答≥80% 的示例,则被认为是成功的:研究共评估了 530 张弥散核磁共振成像,其中 266 张为中风图像,264 张为正常图像。在识别磁共振成像序列类型的初始查询中,ChatGPT-4V 对中风图像的准确率为 88.3%,对正常图像的准确率为 90.1%。在检测弥散限制方面,ChatGPT-4V 对脑卒中图像的准确率为 79.5%,对正常图像的误判率为 15%。在识别涉及的大脑或小脑半球方面,ChatGPT-4V 在 26.2% 的中风图像中正确识别了半球。在识别受影响的特定脑叶或小脑区域方面,ChatGPT-4V 对中风图像的准确率为 20.4%。ChatGPT-4V 对急性中风的诊断敏感性为 79.57%,特异性为 84.87%,阳性预测值为 83.86%,阴性预测值为 80.80%,诊断几率比为 21.86:尽管存在局限性,但 ChatGPT 显示出作为医护人员解释脑卒中病例弥散检查的辅助工具的潜力,及时诊断至关重要:临床意义:ChatGPT 可在卒中病例的各个方面发挥重要作用,如风险评估、早期诊断和治疗计划。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Diagnostic and interventional radiology
Diagnostic and interventional radiology Medicine-Radiology, Nuclear Medicine and Imaging
自引率
4.80%
发文量
0
期刊介绍: Diagnostic and Interventional Radiology (Diagn Interv Radiol) is the open access, online-only official publication of Turkish Society of Radiology. It is published bimonthly and the journal’s publication language is English. The journal is a medium for original articles, reviews, pictorial essays, technical notes related to all fields of diagnostic and interventional radiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信