Accuracy and Readability of ChatGPT on Potential Complications of Interventional Radiology Procedures: AI-Powered Patient Interviewing

IF 3.8 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Academic Radiology Pub Date : 2025-03-01 DOI:10.1016/j.acra.2024.10.028

Esat Kaba , Mehmet Beyazal , Fatma Beyazal Çeliker , İbrahim Yel , Thomas J. Vogl

{"title":"Accuracy and Readability of ChatGPT on Potential Complications of Interventional Radiology Procedures: AI-Powered Patient Interviewing","authors":"Esat Kaba , Mehmet Beyazal , Fatma Beyazal Çeliker , İbrahim Yel , Thomas J. Vogl","doi":"10.1016/j.acra.2024.10.028","DOIUrl":null,"url":null,"abstract":"<div><h3>Rationale and Objectives</h3><div>It is crucial to inform the patient about potential complications and obtain consent before interventional radiology procedures. In this study, we investigated the accuracy, reliability, and readability of the information provided by ChatGPT-4 about potential complications of interventional radiology procedures.</div></div><div><h3>Materials and Methods</h3><div>Potential major and minor complications of 25 different interventional radiology procedures (8 non-vascular, 17 vascular) were asked to ChatGPT-4 chatbot. The responses were evaluated by two experienced interventional radiologists (>25 years and 10 years of experience) using a 5-point Likert scale according to Cardiovascular and Interventional Radiological Society of Europe guidelines. The correlation between the two interventional radiologists' scoring was evaluated by the Wilcoxon signed-rank test, Intraclass Correlation Coefficient (ICC), and Pearson correlation coefficient (PCC). In addition, readability and complexity were quantitatively assessed using the Flesch-Kincaid Grade Level, Flesch Reading Ease scores, and Simple Measure of Gobbledygook (SMOG) index.</div></div><div><h3>Results</h3><div>Interventional radiologist 1 (IR1) and interventional radiologist 2 (IR2) gave 104 and 109 points, respectively, out of a potential 125 points for the total of all procedures. There was no statistically significant difference between the total scores of the two IRs (p = 0.244). The IRs demonstrated high agreement across all procedure ratings (ICC=0.928). Both IRs scored 34 out of 40 points for the eight non-vascular procedures. 17 vascular procedures received 70 points out of 85 from IR1 and 75 from IR2. The agreement between the two observers' assessments was good, with PCC values of 0.908 and 0.896 for non-vascular and vascular procedures, respectively. Readability levels were overall low. The mean Flesch-Kincaid Grade Level, Flesch Reading Ease scores, and SMOG index were 12.51 ± 1.14 (college level) 30.27 ± 8.38 (college level), and 14.46 ± 0.76 (college level), respectively. There was no statistically significant difference in readability between non-vascular and vascular procedures (p = 0.16).</div></div><div><h3>Conclusion</h3><div>ChatGPT-4 demonstrated remarkable performance, highlighting its potential to enhance accessibility to information about interventional radiology procedures and support the creation of educational materials for patients. Based on the findings of our study, while ChatGPT provides accurate information and shows no evidence of hallucinations, it is important to emphasize that a high level of education and health literacy are required to fully comprehend its responses.</div></div>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":"32 3","pages":"Pages 1547-1553"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1076633224007918","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Rationale and Objectives

It is crucial to inform the patient about potential complications and obtain consent before interventional radiology procedures. In this study, we investigated the accuracy, reliability, and readability of the information provided by ChatGPT-4 about potential complications of interventional radiology procedures.

Materials and Methods

Potential major and minor complications of 25 different interventional radiology procedures (8 non-vascular, 17 vascular) were asked to ChatGPT-4 chatbot. The responses were evaluated by two experienced interventional radiologists (>25 years and 10 years of experience) using a 5-point Likert scale according to Cardiovascular and Interventional Radiological Society of Europe guidelines. The correlation between the two interventional radiologists' scoring was evaluated by the Wilcoxon signed-rank test, Intraclass Correlation Coefficient (ICC), and Pearson correlation coefficient (PCC). In addition, readability and complexity were quantitatively assessed using the Flesch-Kincaid Grade Level, Flesch Reading Ease scores, and Simple Measure of Gobbledygook (SMOG) index.

Results

Interventional radiologist 1 (IR1) and interventional radiologist 2 (IR2) gave 104 and 109 points, respectively, out of a potential 125 points for the total of all procedures. There was no statistically significant difference between the total scores of the two IRs (p = 0.244). The IRs demonstrated high agreement across all procedure ratings (ICC = 0.928). Both IRs scored 34 out of 40 points for the eight non-vascular procedures. 17 vascular procedures received 70 points out of 85 from IR1 and 75 from IR2. The agreement between the two observers' assessments was good, with PCC values of 0.908 and 0.896 for non-vascular and vascular procedures, respectively. Readability levels were overall low. The mean Flesch-Kincaid Grade Level, Flesch Reading Ease scores, and SMOG index were 12.51 ± 1.14 (college level) 30.27 ± 8.38 (college level), and 14.46 ± 0.76 (college level), respectively. There was no statistically significant difference in readability between non-vascular and vascular procedures (p = 0.16).

Conclusion

ChatGPT-4 demonstrated remarkable performance, highlighting its potential to enhance accessibility to information about interventional radiology procedures and support the creation of educational materials for patients. Based on the findings of our study, while ChatGPT provides accurate information and shows no evidence of hallucinations, it is important to emphasize that a high level of education and health literacy are required to fully comprehend its responses.

查看原文本刊更多论文

关于介入放射学手术潜在并发症的 ChatGPT 的准确性和可读性：人工智能驱动的患者访谈。

理由和目标：在介入放射学手术前告知患者潜在并发症并征得同意至关重要。在这项研究中，我们调查了 ChatGPT-4 提供的有关介入放射手术潜在并发症信息的准确性、可靠性和可读性：我们向 ChatGPT-4 聊天机器人询问了 25 种不同介入放射学手术（8 种非血管性手术，17 种血管性手术）的潜在主要和次要并发症。两位经验丰富的介入放射科医生（分别有 25 年以上和 10 年以上的经验）根据欧洲心血管和介入放射学会指南，使用 5 点李克特量表对回答进行了评估。两位介入放射科医生评分之间的相关性通过 Wilcoxon 符号秩检验、类内相关系数 (ICC) 和皮尔逊相关系数 (PCC) 进行评估。此外，还使用 Flesch-Kincaid 分级、Flesch 阅读轻松度评分和简单拗口（SMOG）指数对可读性和复杂性进行了定量评估：结果：介入放射科医生 1（IR1）和介入放射科医生 2（IR2）分别给出了 104 分和 109 分，而所有程序的总分可能是 125 分。两位 IR 的总分在统计学上没有明显差异（p = 0.244）。在所有程序的评分中，独立评审员的评分结果都非常一致（ICC=0.928）。在 8 项非血管手术的 40 分评分中，两位 IR 均获得了 34 分。17 项血管手术中，IR1 和 IR2 分别打出了 70 分和 75 分（满分分别为 85 分和 75 分）。两位观察员的评估结果一致性良好，非血管手术和血管手术的 PCC 值分别为 0.908 和 0.896。可读性水平总体较低。Flesch-Kincaid 等级平均值、Flesch 阅读轻松度得分和 SMOG 指数分别为 12.51 ± 1.14（大学水平）、30.27 ± 8.38（大学水平）和 14.46 ± 0.76（大学水平）。非血管性和血管性手术的可读性差异无统计学意义（P = 0.16）：ChatGPT-4表现出色，突显了其在提高介入放射学手术信息的可及性和支持为患者创建教育材料方面的潜力。根据我们的研究结果，虽然 ChatGPT 能提供准确的信息，也没有出现幻觉的迹象，但必须强调的是，要完全理解它的反应，需要较高的教育水平和健康素养。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Academic Radiology 医学-核医学

CiteScore

7.60

自引率

10.40%

发文量

432

审稿时长

18 days

期刊介绍： Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.