Serdar Bozyel, Ahmet Berk Duman, Şadiye Nur Dalgıç, Abdülcebar Şipal, Faysal Şaylık, Şükriye Ebru Gölcük Önder, Metin Çağdaş, Tümer Erdem Güler, Tolga Aksu, Ulas Bağcı, Nurgül Keser
{"title":"Large Language Models in Intracardiac Electrogram Interpretation: A New Frontier in Cardiac Diagnostics for Pacemaker Patients.","authors":"Serdar Bozyel, Ahmet Berk Duman, Şadiye Nur Dalgıç, Abdülcebar Şipal, Faysal Şaylık, Şükriye Ebru Gölcük Önder, Metin Çağdaş, Tümer Erdem Güler, Tolga Aksu, Ulas Bağcı, Nurgül Keser","doi":"10.14744/AnatolJCardiol.2025.5238","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Interpreting intracardiac electrograms (EGMs) requires expertise that many cardiologists lack. Artificial intelligence models like ChatGPT-4o may improve diagnostic accuracy. This study evaluates ChatGPT-4o's performance in EGM interpretation across 4 scenarios (A-D) with increasing contextual information.</p><p><strong>Methods: </strong>Twenty EGM cases from The EHRA Book of Pacemaker, ICD, and CRT Troubleshooting were analyzed using ChatGPT-4o. Ten predefined features were assessed in Scenarios A and B, while Scenarios C and D required 20 correct responses per scenario across all cases. Performance was evaluated over 2 months using McNemar's test, Cohen's Kappa, and Prevalence- and Bias-Adjusted Kappa (PABAK).</p><p><strong>Results: </strong>Providing clinical context enhanced ChatGPT-4o's accuracy, improving from 57% (Scenario A) to 66% (Scenario B). \"No Answer\" rates decreased from 19.5% to 8%, while false responses increased from 8.5% to 11%, suggesting occasional misinterpretation. Agreement in Scenario A showed high reliability for atrial activity (κ = 0.7) and synchronization (κ = 0.7), but poor for chamber (κ = -0.26). In Scenario B, understanding achieved near-perfect agreement (Prevalence-Adjustment and Bias-Adjustment Kappa (PABAK) = 1), while ventricular activity remained unreliable (κ = -0.11). In Scenarios C (30%) and D (25%), accuracy was lower, and agreement between baseline and second-month responses remained fair (κ = 0.285 and 0.3, respectively), indicating limited consistency in complex decision-making tasks.</p><p><strong>Conclusion: </strong>This study provides the first systematic evaluation of ChatGPT-4o in EGM interpretation, demonstrating promising accuracy and reliability in structured tasks. While the model integrated contextual data well, its adaptability to complex cases was limited. Further optimization and validation are needed before clinical use.</p>","PeriodicalId":7835,"journal":{"name":"Anatolian Journal of Cardiology","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anatolian Journal of Cardiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.14744/AnatolJCardiol.2025.5238","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Interpreting intracardiac electrograms (EGMs) requires expertise that many cardiologists lack. Artificial intelligence models like ChatGPT-4o may improve diagnostic accuracy. This study evaluates ChatGPT-4o's performance in EGM interpretation across 4 scenarios (A-D) with increasing contextual information.
Methods: Twenty EGM cases from The EHRA Book of Pacemaker, ICD, and CRT Troubleshooting were analyzed using ChatGPT-4o. Ten predefined features were assessed in Scenarios A and B, while Scenarios C and D required 20 correct responses per scenario across all cases. Performance was evaluated over 2 months using McNemar's test, Cohen's Kappa, and Prevalence- and Bias-Adjusted Kappa (PABAK).
Results: Providing clinical context enhanced ChatGPT-4o's accuracy, improving from 57% (Scenario A) to 66% (Scenario B). "No Answer" rates decreased from 19.5% to 8%, while false responses increased from 8.5% to 11%, suggesting occasional misinterpretation. Agreement in Scenario A showed high reliability for atrial activity (κ = 0.7) and synchronization (κ = 0.7), but poor for chamber (κ = -0.26). In Scenario B, understanding achieved near-perfect agreement (Prevalence-Adjustment and Bias-Adjustment Kappa (PABAK) = 1), while ventricular activity remained unreliable (κ = -0.11). In Scenarios C (30%) and D (25%), accuracy was lower, and agreement between baseline and second-month responses remained fair (κ = 0.285 and 0.3, respectively), indicating limited consistency in complex decision-making tasks.
Conclusion: This study provides the first systematic evaluation of ChatGPT-4o in EGM interpretation, demonstrating promising accuracy and reliability in structured tasks. While the model integrated contextual data well, its adaptability to complex cases was limited. Further optimization and validation are needed before clinical use.
期刊介绍:
The Anatolian Journal of Cardiology is an international monthly periodical on cardiology published on independent, unbiased, double-blinded and peer-review principles. The journal’s publication language is English.
The Anatolian Journal of Cardiology aims to publish qualified and original clinical, experimental and basic research on cardiology at the international level. The journal’s scope also covers editorial comments, reviews of innovations in medical education and practice, case reports, original images, scientific letters, educational articles, letters to the editor, articles on publication ethics, diagnostic puzzles, and issues in social cardiology.
The target readership includes academic members, specialists, residents, and general practitioners working in the fields of adult cardiology, pediatric cardiology, cardiovascular surgery and internal medicine.