Tanmay Gokhale, Nirav R Bhatt, Matthew Starr, Suresh Mulukutla, Floyd Thoma, Murat Akcakaya, Salah Al-Zaiti, Raul G Nogueira, Samir Saba
{"title":"Prediction of atrial fibrillation admissions in arrhythmia naïve patients from structured electronic health record data.","authors":"Tanmay Gokhale, Nirav R Bhatt, Matthew Starr, Suresh Mulukutla, Floyd Thoma, Murat Akcakaya, Salah Al-Zaiti, Raul G Nogueira, Samir Saba","doi":"10.1186/s12911-025-03199-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Atrial fibrillation (AF) is the most prevalent sustained arrhythmia, but its diagnosis is often elusive. In this study, we examined the role of machine learning (ML) algorithms in predicting AF in arrhythmia-naïve patients, based on structured domains of the electronic health records (EHR).</p><p><strong>Methods: </strong>Patients (N = 186,769) with no prior history of AF, who received at least 1 echocardiogram and who had a minimum of 3 months of follow-up, were included. Data from the EHR were grouped into domains (demographic; social determinants of health; past medical history, medications, electrocardiogram (EKG), and echocardiogram (Echo)) and tested incrementally for their ability to predict incident AF admission to the hospital.</p><p><strong>Results: </strong>Of the overall cohort, 4,751 (2.5%) patients were admitted for AF over a median follow-up time of 35 months. Incremental EHR domains increased the area under the receiver-operator curve (AUROC) for all ML classifiers, with Gradient Boosting achieving an AUROC of 0.85 when all domains were included, but with a poor F1 score of 14% at the maximal Youden index. Using the EKG and Echo domains alone achieved comparable performance to when all EHR domains were included. These results were externally validated.</p><p><strong>Conclusion: </strong>More domains of structured EHR improve the ability to predict incident AF admissions but structured EKG and Echo domains realize the most gain. Although ML models exhibited good discrimination, the precision is poor due to the low event rate.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"348"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482350/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03199-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Atrial fibrillation (AF) is the most prevalent sustained arrhythmia, but its diagnosis is often elusive. In this study, we examined the role of machine learning (ML) algorithms in predicting AF in arrhythmia-naïve patients, based on structured domains of the electronic health records (EHR).
Methods: Patients (N = 186,769) with no prior history of AF, who received at least 1 echocardiogram and who had a minimum of 3 months of follow-up, were included. Data from the EHR were grouped into domains (demographic; social determinants of health; past medical history, medications, electrocardiogram (EKG), and echocardiogram (Echo)) and tested incrementally for their ability to predict incident AF admission to the hospital.
Results: Of the overall cohort, 4,751 (2.5%) patients were admitted for AF over a median follow-up time of 35 months. Incremental EHR domains increased the area under the receiver-operator curve (AUROC) for all ML classifiers, with Gradient Boosting achieving an AUROC of 0.85 when all domains were included, but with a poor F1 score of 14% at the maximal Youden index. Using the EKG and Echo domains alone achieved comparable performance to when all EHR domains were included. These results were externally validated.
Conclusion: More domains of structured EHR improve the ability to predict incident AF admissions but structured EKG and Echo domains realize the most gain. Although ML models exhibited good discrimination, the precision is poor due to the low event rate.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.