Yuval Avidan, Vsevolod Tabachnikov, Orel Ben Court, Razi Khoury, Amir Aker
{"title":"In the face of confounders: Atrial fibrillation detection - Practitioners vs. ChatGPT.","authors":"Yuval Avidan, Vsevolod Tabachnikov, Orel Ben Court, Razi Khoury, Amir Aker","doi":"10.1016/j.jelectrocard.2024.153851","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Atrial fibrillation (AF) is the most common arrhythmia in clinical practice, yet interpretation concerns among healthcare providers persist. Confounding factors contribute to false-positive and false-negative AF diagnoses, leading to potential omissions. Artificial intelligence advancements show promise in electrocardiogram (ECG) interpretation. We sought to examine the diagnostic accuracy of ChatGPT-4omni (GPT-4o), equipped with image evaluation capabilities, in interpreting ECGs with confounding factors and compare its performance to that of physicians.</p><p><strong>Methods: </strong>Twenty ECG cases, divided into Group A (10 cases of AF or atrial flutter) and Group B (10 cases of sinus or another atrial rhythm), were crafted into multiple-choice questions. Total of 100 practitioners (25 from each: emergency medicine, internal medicine, primary care, and cardiology) were tasked to identify the underlying rhythm. Next, GPT-4o was prompted in five separate sessions.</p><p><strong>Results: </strong>GPT-4o performed inadequately, averaging 3 (±2) in Group A questions and 5.40 (±1.34) in Group B questions. Upon examining the accuracy of the total ECG questions, no significant difference was found between GPT-4o, internists, and primary care physicians (p = 0.952 and = 0.852, respectively). Cardiologists outperformed other medical disciplines and GPT-4o (p < 0.001), while emergency physicians followed in accuracy, though comparison to GPT-4o only indicated a trend (p = 0.068).</p><p><strong>Conclusion: </strong>GPT-4o demonstrated suboptimal accuracy with significant under- and over-recognition of AF in ECGs with confounding factors. Despite its potential as a supportive tool for ECG interpretation, its performance did not surpass that of medical practitioners, underscoring the continued importance of human expertise in complex diagnostics.</p>","PeriodicalId":15606,"journal":{"name":"Journal of electrocardiology","volume":"88 ","pages":"153851"},"PeriodicalIF":1.3000,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of electrocardiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jelectrocard.2024.153851","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Atrial fibrillation (AF) is the most common arrhythmia in clinical practice, yet interpretation concerns among healthcare providers persist. Confounding factors contribute to false-positive and false-negative AF diagnoses, leading to potential omissions. Artificial intelligence advancements show promise in electrocardiogram (ECG) interpretation. We sought to examine the diagnostic accuracy of ChatGPT-4omni (GPT-4o), equipped with image evaluation capabilities, in interpreting ECGs with confounding factors and compare its performance to that of physicians.
Methods: Twenty ECG cases, divided into Group A (10 cases of AF or atrial flutter) and Group B (10 cases of sinus or another atrial rhythm), were crafted into multiple-choice questions. Total of 100 practitioners (25 from each: emergency medicine, internal medicine, primary care, and cardiology) were tasked to identify the underlying rhythm. Next, GPT-4o was prompted in five separate sessions.
Results: GPT-4o performed inadequately, averaging 3 (±2) in Group A questions and 5.40 (±1.34) in Group B questions. Upon examining the accuracy of the total ECG questions, no significant difference was found between GPT-4o, internists, and primary care physicians (p = 0.952 and = 0.852, respectively). Cardiologists outperformed other medical disciplines and GPT-4o (p < 0.001), while emergency physicians followed in accuracy, though comparison to GPT-4o only indicated a trend (p = 0.068).
Conclusion: GPT-4o demonstrated suboptimal accuracy with significant under- and over-recognition of AF in ECGs with confounding factors. Despite its potential as a supportive tool for ECG interpretation, its performance did not surpass that of medical practitioners, underscoring the continued importance of human expertise in complex diagnostics.
期刊介绍:
The Journal of Electrocardiology is devoted exclusively to clinical and experimental studies of the electrical activities of the heart. It seeks to contribute significantly to the accuracy of diagnosis and prognosis and the effective treatment, prevention, or delay of heart disease. Editorial contents include electrocardiography, vectorcardiography, arrhythmias, membrane action potential, cardiac pacing, monitoring defibrillation, instrumentation, drug effects, and computer applications.