Ivan Zeljkovic, Andrej Novak, Ante Lisicic, Ana Jordan, Ana Serman, Ivana Jurin, Nikola Pavlovic, Sime Manola
{"title":"Beyond Text: The Impact of Clinical Context on GPT-4's 12-Lead Electrocardiogram Interpretation Accuracy.","authors":"Ivan Zeljkovic, Andrej Novak, Ante Lisicic, Ana Jordan, Ana Serman, Ivana Jurin, Nikola Pavlovic, Sime Manola","doi":"10.1016/j.cjca.2025.01.036","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) and large language models (LLMs), such as OpenAI's GPT-4, are increasingly being explored for medical applications. Recently, GPT-4 gained image processing capabilities, enabling it to handle tasks such as image captioning, visual question answering, and potentially interpreting medical data. Despite promising potential in diagnostics, the effectiveness of GPT-4 in interpreting complex 12-lead electrocardiograms (ECGs) remains to be assessed.</p><p><strong>Methods: </strong>This study utilized GPT-4 to interpret 150 12-lead ECGs from the Cardiology Research Dubrava (CaRD) registry, spanning a wide range of cardiac pathologies. The ECGs were classified into 4 categories for analysis: arrhythmias, conduction system abnormalities, acute coronary syndrome, and other. Two experiments were conducted: one where GPT-4 interpreted ECGs without clinical context, and another with added clinical scenarios. A panel of experienced cardiologists evaluated the accuracy of GPT-4's interpretations.</p><p><strong>Results: </strong>In this cross-sectional observational study, GPT-4 demonstrated a correct interpretation rate of 19% without clinical context and a significantly improved rate of 45% with context (P < 0.001). The addition of clinical scenarios significantly enhanced interpretative accuracy, particularly in the acute coronary syndrome category (10% vs 70%; P < 0.0.01). The \"other\" category showed no impact (51% vs 59%; P = 0.640), and trends toward significance were observed in the arrhythmias (9.7% vs 32%; P = 0.059) and conduction system abnormalities (4.8% vs 19%; P = 0.088) categories when given clinical context.</p><p><strong>Conclusions: </strong>Although GPT-4 shows potential in aiding 12-lead ECG interpretation, its effectiveness varies significantly with clinical context. The study suggests that GPT-4 alone in its current form may not provide accurate 12-lead ECG interpretation.</p>","PeriodicalId":9555,"journal":{"name":"Canadian Journal of Cardiology","volume":" ","pages":""},"PeriodicalIF":5.8000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Journal of Cardiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.cjca.2025.01.036","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Artificial intelligence (AI) and large language models (LLMs), such as OpenAI's GPT-4, are increasingly being explored for medical applications. Recently, GPT-4 gained image processing capabilities, enabling it to handle tasks such as image captioning, visual question answering, and potentially interpreting medical data. Despite promising potential in diagnostics, the effectiveness of GPT-4 in interpreting complex 12-lead electrocardiograms (ECGs) remains to be assessed.
Methods: This study utilized GPT-4 to interpret 150 12-lead ECGs from the Cardiology Research Dubrava (CaRD) registry, spanning a wide range of cardiac pathologies. The ECGs were classified into 4 categories for analysis: arrhythmias, conduction system abnormalities, acute coronary syndrome, and other. Two experiments were conducted: one where GPT-4 interpreted ECGs without clinical context, and another with added clinical scenarios. A panel of experienced cardiologists evaluated the accuracy of GPT-4's interpretations.
Results: In this cross-sectional observational study, GPT-4 demonstrated a correct interpretation rate of 19% without clinical context and a significantly improved rate of 45% with context (P < 0.001). The addition of clinical scenarios significantly enhanced interpretative accuracy, particularly in the acute coronary syndrome category (10% vs 70%; P < 0.0.01). The "other" category showed no impact (51% vs 59%; P = 0.640), and trends toward significance were observed in the arrhythmias (9.7% vs 32%; P = 0.059) and conduction system abnormalities (4.8% vs 19%; P = 0.088) categories when given clinical context.
Conclusions: Although GPT-4 shows potential in aiding 12-lead ECG interpretation, its effectiveness varies significantly with clinical context. The study suggests that GPT-4 alone in its current form may not provide accurate 12-lead ECG interpretation.
期刊介绍:
The Canadian Journal of Cardiology (CJC) is the official journal of the Canadian Cardiovascular Society (CCS). The CJC is a vehicle for the international dissemination of new knowledge in cardiology and cardiovascular science, particularly serving as the major venue for Canadian cardiovascular medicine.