The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images.

IF 3 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES
Pietro G Lacaita, Malik Galijasevic, Michael Swoboda, Leonhard Gruber, Yannick Scharll, Fabian Barbieri, Gerlig Widmann, Gudrun M Feuchtner
{"title":"The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images.","authors":"Pietro G Lacaita, Malik Galijasevic, Michael Swoboda, Leonhard Gruber, Yannick Scharll, Fabian Barbieri, Gerlig Widmann, Gudrun M Feuchtner","doi":"10.3390/jpm15050194","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background/Objectives:</b> Large language models (LLMs), such as ChatGPT, have emerged as potential clinical support tools to enhance precision in personalized patient care, but their reliability in radiological image interpretation remains uncertain. The primary aim of our study was to evaluate the diagnostic accuracy of ChatGPT-4o in interpreting chest X-rays (CXRs) and abdominal X-rays (AXRs) by comparing its performance to expert radiology findings, whilst secondary aims were diagnostic confidence and patient safety. <b>Methods</b>: A total of 500 X-rays, including 257 CXR (51.4%) and 243 AXR (48.5%), were analyzed. Diagnoses made by ChatGPT-4o were compared to expert interpretations. Confidence scores (1-4) were assigned and responses were evaluated for patient safety. <b>Results:</b> ChatGPT-4o correctly identified 345 of 500 (69%) pathologies (95% CI: 64.81-72.9). For AXRs 175 of 243 (72.02%) pathologies were correctly diagnosed (95% CI: 66.06-77.28), while for CXRs 170 of 257 (66.15%) were accurate (95% CI: 60.16-71.66). The highest detection rates among CXRs were observed for pulmonary edema, tumor, pneumonia, pleural effusion, cardiomegaly, and emphysema, and lower rates were observed for pneumothorax, rib fractures, and enlarged mediastinum. AXR performance was highest for intestinal obstruction and foreign bodies, and weaker for pneumoperitoneum, renal calculi, and diverticulitis. Confidence scores were higher for AXRs (mean 3.45 ± 1.1) than CXRs (mean 2.48 ± 1.45). All responses (100%) were considered to be safe for the patient. Interobserver agreement was high (kappa = 0.920), and reliability (second prompt) was moderate (kappa = 0.750). <b>Conclusions:</b> ChatGPT-4o demonstrated moderate accuracy for the interpretation of X-rays, being higher for AXRs compared to CXRs. Improvements are required for its use as efficient clinical support tool.</p>","PeriodicalId":16722,"journal":{"name":"Journal of Personalized Medicine","volume":"15 5","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12113413/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Personalized Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/jpm15050194","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background/Objectives: Large language models (LLMs), such as ChatGPT, have emerged as potential clinical support tools to enhance precision in personalized patient care, but their reliability in radiological image interpretation remains uncertain. The primary aim of our study was to evaluate the diagnostic accuracy of ChatGPT-4o in interpreting chest X-rays (CXRs) and abdominal X-rays (AXRs) by comparing its performance to expert radiology findings, whilst secondary aims were diagnostic confidence and patient safety. Methods: A total of 500 X-rays, including 257 CXR (51.4%) and 243 AXR (48.5%), were analyzed. Diagnoses made by ChatGPT-4o were compared to expert interpretations. Confidence scores (1-4) were assigned and responses were evaluated for patient safety. Results: ChatGPT-4o correctly identified 345 of 500 (69%) pathologies (95% CI: 64.81-72.9). For AXRs 175 of 243 (72.02%) pathologies were correctly diagnosed (95% CI: 66.06-77.28), while for CXRs 170 of 257 (66.15%) were accurate (95% CI: 60.16-71.66). The highest detection rates among CXRs were observed for pulmonary edema, tumor, pneumonia, pleural effusion, cardiomegaly, and emphysema, and lower rates were observed for pneumothorax, rib fractures, and enlarged mediastinum. AXR performance was highest for intestinal obstruction and foreign bodies, and weaker for pneumoperitoneum, renal calculi, and diverticulitis. Confidence scores were higher for AXRs (mean 3.45 ± 1.1) than CXRs (mean 2.48 ± 1.45). All responses (100%) were considered to be safe for the patient. Interobserver agreement was high (kappa = 0.920), and reliability (second prompt) was moderate (kappa = 0.750). Conclusions: ChatGPT-4o demonstrated moderate accuracy for the interpretation of X-rays, being higher for AXRs compared to CXRs. Improvements are required for its use as efficient clinical support tool.

chatgpt - 40在胸部和腹部x线图像解释中的准确性。
背景/目的:大型语言模型(llm),如ChatGPT,已成为潜在的临床支持工具,以提高个性化患者护理的准确性,但其在放射图像解释中的可靠性仍不确定。本研究的主要目的是通过将chatgpt - 40的表现与专家放射学结果进行比较,评估其在解释胸部x光片(cxr)和腹部x光片(axr)时的诊断准确性,而次要目的是诊断可信度和患者安全性。方法:对500张x线片进行分析,其中CXR 257张(51.4%),AXR 243张(48.5%)。chatgpt - 40的诊断与专家的解释进行了比较。分配置信度评分(1-4),并评估患者安全反应。结果:chatgpt - 40正确识别了500例(69%)病理中的345例(95% CI: 64.81-72.9)。243例AXRs中有175例(72.02%)病理诊断正确(95% CI: 66.06 ~ 77.28), 257例CXRs中有170例(66.15%)病理诊断准确(95% CI: 60.16 ~ 71.66)。肺水肿、肿瘤、肺炎、胸腔积液、心脏肿大、肺气肿检出率最高,气胸、肋骨骨折、纵隔扩大检出率较低。肠梗阻和异物的AXR性能最高,气腹、肾结石和憩室炎的AXR性能较差。axr的置信度评分(平均3.45±1.1)高于cxr(平均2.48±1.45)。所有反应(100%)被认为对患者是安全的。观察者间一致性高(kappa = 0.920),信度(第二次提示)中等(kappa = 0.750)。结论:chatgpt - 40在x射线解释方面表现出中等的准确性,与cxr相比,axr的准确性更高。作为有效的临床支持工具,需要改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Personalized Medicine
Journal of Personalized Medicine Medicine-Medicine (miscellaneous)
CiteScore
4.10
自引率
0.00%
发文量
1878
审稿时长
11 weeks
期刊介绍: Journal of Personalized Medicine (JPM; ISSN 2075-4426) is an international, open access journal aimed at bringing all aspects of personalized medicine to one platform. JPM publishes cutting edge, innovative preclinical and translational scientific research and technologies related to personalized medicine (e.g., pharmacogenomics/proteomics, systems biology). JPM recognizes that personalized medicine—the assessment of genetic, environmental and host factors that cause variability of individuals—is a challenging, transdisciplinary topic that requires discussions from a range of experts. For a comprehensive perspective of personalized medicine, JPM aims to integrate expertise from the molecular and translational sciences, therapeutics and diagnostics, as well as discussions of regulatory, social, ethical and policy aspects. We provide a forum to bring together academic and clinical researchers, biotechnology, diagnostic and pharmaceutical companies, health professionals, regulatory and ethical experts, and government and regulatory authorities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信