Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study.

IF 2
JMIR AI Pub Date : 2025-08-22 DOI:10.2196/74426
Haya Engelstein, Roni Ramon-Gonen, Avi Sabbag, Eyal Klang, Karin Sudri, Michal Cohen-Shelly, Israel Barbash
{"title":"Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study.","authors":"Haya Engelstein, Roni Ramon-Gonen, Avi Sabbag, Eyal Klang, Karin Sudri, Michal Cohen-Shelly, Israel Barbash","doi":"10.2196/74426","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Recent progress has demonstrated the potential of deep learning models in analyzing electrocardiogram (ECG) pathologies. However, this method is intricate, expensive to develop, and designed for specific purposes. Large language models show promise in medical image interpretation, and yet their effectiveness in ECG analysis remains understudied. Generative Pretrained Transformer 4 Omni (GPT-4o), a multimodal artificial intelligence model, capable of processing images and text without task-specific training, may offer an accessible alternative.</p><p><strong>Objective: </strong>This study aimed to evaluate GPT-4o's effectiveness in interpreting 12-lead ECGs, assessing classification accuracy, and exploring methods to enhance its performance.</p><p><strong>Methods: </strong>A total of 6 common ECG diagnoses were evaluated: normal ECG, ST-segment elevation myocardial infarction, atrial fibrillation, right bundle branch block, left bundle branch block, and paced rhythm, with 30 normal ECGs and 10 of each abnormal pattern, totaling 80 cases. Deidentified ECGs were analyzed using OpenAI's GPT-4o. Our study used both zero-shot and few-shot learning methodologies to investigate three main scenarios: (1) ECG image recognition, (2) binary classification of normal versus abnormal ECGs, and (3) multiclass classification into 6 categories.</p><p><strong>Results: </strong>The model excelled in recognizing ECG images, achieving an accuracy of 100%. In the classification of normal or abnormal ECG cases, the few-shot learning approach improved GPT-4o's accuracy by 30% from the baseline, reaching 83% (95% CI 81.8%-84.6%). However, multiclass classification for a specific pathology remained limited, achieving only 41% accuracy.</p><p><strong>Conclusions: </strong>GPT-4o effectively differentiates normal from abnormal ECGs, suggesting its potential as an accessible artificial intelligence-assisted triage tool. Although limited in diagnosing specific cardiac conditions, GPT-4o's capability to interpret ECG images without specialized training highlights its potential for preliminary ECG interpretation in clinical and remote settings.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e74426"},"PeriodicalIF":2.0000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12375907/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/74426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Recent progress has demonstrated the potential of deep learning models in analyzing electrocardiogram (ECG) pathologies. However, this method is intricate, expensive to develop, and designed for specific purposes. Large language models show promise in medical image interpretation, and yet their effectiveness in ECG analysis remains understudied. Generative Pretrained Transformer 4 Omni (GPT-4o), a multimodal artificial intelligence model, capable of processing images and text without task-specific training, may offer an accessible alternative.

Objective: This study aimed to evaluate GPT-4o's effectiveness in interpreting 12-lead ECGs, assessing classification accuracy, and exploring methods to enhance its performance.

Methods: A total of 6 common ECG diagnoses were evaluated: normal ECG, ST-segment elevation myocardial infarction, atrial fibrillation, right bundle branch block, left bundle branch block, and paced rhythm, with 30 normal ECGs and 10 of each abnormal pattern, totaling 80 cases. Deidentified ECGs were analyzed using OpenAI's GPT-4o. Our study used both zero-shot and few-shot learning methodologies to investigate three main scenarios: (1) ECG image recognition, (2) binary classification of normal versus abnormal ECGs, and (3) multiclass classification into 6 categories.

Results: The model excelled in recognizing ECG images, achieving an accuracy of 100%. In the classification of normal or abnormal ECG cases, the few-shot learning approach improved GPT-4o's accuracy by 30% from the baseline, reaching 83% (95% CI 81.8%-84.6%). However, multiclass classification for a specific pathology remained limited, achieving only 41% accuracy.

Conclusions: GPT-4o effectively differentiates normal from abnormal ECGs, suggesting its potential as an accessible artificial intelligence-assisted triage tool. Although limited in diagnosing specific cardiac conditions, GPT-4o's capability to interpret ECG images without specialized training highlights its potential for preliminary ECG interpretation in clinical and remote settings.

Abstract Image

Abstract Image

Abstract Image

gpt - 40模型在心脏诊断中解释心电图图像的有效性:诊断准确性研究。
背景:最近的进展已经证明了深度学习模型在分析心电图(ECG)病理方面的潜力。然而,这种方法复杂,开发成本高,并且是为特定目的而设计的。大型语言模型在医学图像解释中显示出前景,但其在心电图分析中的有效性仍有待研究。生成式预训练Transformer 4 Omni (gpt - 40)是一种多模式人工智能模型,无需特定任务训练即可处理图像和文本,可能是一种可访问的替代方案。目的:本研究旨在评价gpt - 40在12导联心电图解释中的有效性,评估其分类准确性,并探索提高其性能的方法。方法:对心电图正常、st段抬高型心肌梗死、心房颤动、右束支传导阻滞、左束支传导阻滞、心律失常6项常见心电图诊断进行评价,其中正常心电图30例,各异常模式10例,共80例。使用OpenAI的gpt - 40分析鉴定的心电图。我们的研究使用零次和少次学习方法来研究三个主要场景:(1)心电图像识别;(2)正常与异常心电图的二值分类;(3)多类分类,分为6类。结果:该模型具有较好的心电图像识别能力,准确率达到100%。在ECG正常或异常病例的分类中,少射学习方法将gpt - 40的准确率从基线提高了30%,达到83% (95% CI 81.8%-84.6%)。然而,对特定病理的多分类仍然有限,准确率仅为41%。结论:gpt - 40可有效区分正常和异常心电图,提示其作为一种可获得的人工智能辅助分诊工具的潜力。虽然在诊断特定的心脏疾病方面有限,但gpt - 40在没有专门培训的情况下解释心电图图像的能力突出了其在临床和远程环境中进行初步心电图解释的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信