Validation, bias assessment, and optimization of the UNAFIED 2-year risk prediction model for undiagnosed atrial fibrillation using national electronic health data
Mohammad Ateya PharmD, MS , Danai Aristeridou MSc , George H. Sands MD , Jessica Zielinski BS , Randall W. Grout MD, MS , A. Carmine Colavecchia PharmD, PhD , Oussama Wazni MD, FHRS , Saira N. Haque PhD, MHSA
{"title":"Validation, bias assessment, and optimization of the UNAFIED 2-year risk prediction model for undiagnosed atrial fibrillation using national electronic health data","authors":"Mohammad Ateya PharmD, MS , Danai Aristeridou MSc , George H. Sands MD , Jessica Zielinski BS , Randall W. Grout MD, MS , A. Carmine Colavecchia PharmD, PhD , Oussama Wazni MD, FHRS , Saira N. Haque PhD, MHSA","doi":"10.1016/j.hroo.2024.09.010","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Prediction models for atrial fibrillation (AF) may enable earlier detection and guideline-directed treatment decisions. However, model bias may lead to inaccurate predictions and unintended consequences.</div></div><div><h3>Objective</h3><div>The purpose of this study was to validate, assess bias, and improve generalizability of “UNAFIED-10,” a 2-year, 10-variable predictive model of undiagnosed AF in a national data set (originally developed using the Indiana Network for Patient Care regional data).</div></div><div><h3>Methods</h3><div>UNAFIED-10 was validated and optimized using Optum de-identified electronic health record data set. AF diagnoses were recorded in the January 2018–December 2019 period (outcome period), with January 2016–December 2017 as the baseline period. Validation cohorts (patients with AF and non-AF controls, aged ≥40 years) comprised the full imbalanced and randomly sampled balanced data sets. Model performance and bias in patient subpopulations based on sex, insurance, race, and region were evaluated.</div></div><div><h3>Results</h3><div>Of the 6,058,657 eligible patients (mean age 60 ± 12 years), 4.1% (n = 246,975) had their first AF diagnosis within the outcome period. The validated UNAFIED-10 model achieved a higher C-statistic (0.85 [95% confidence interval 0.85–0.86] vs 0.81 [0.80–0.81]) and sensitivity (86% vs 74%) but lower specificity (66% vs 74%) than the original UNAFIED-10 model. During retraining and optimization, the variables insurance, shock, and albumin were excluded to address bias and improve generalizability. This generated an 8-variable model (UNAFIED-8) with consistent performance.</div></div><div><h3>Conclusion</h3><div>UNAFIED-10, developed using regional patient data, displayed consistent performance in a large national data set. UNAFIED-8 is more parsimonious and generalizable for using advanced analytics for AF detection. Future directions include validation on additional data sets.</div></div>","PeriodicalId":29772,"journal":{"name":"Heart Rhythm O2","volume":"5 12","pages":"Pages 925-935"},"PeriodicalIF":2.5000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11721729/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Heart Rhythm O2","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666501824003015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Prediction models for atrial fibrillation (AF) may enable earlier detection and guideline-directed treatment decisions. However, model bias may lead to inaccurate predictions and unintended consequences.
Objective
The purpose of this study was to validate, assess bias, and improve generalizability of “UNAFIED-10,” a 2-year, 10-variable predictive model of undiagnosed AF in a national data set (originally developed using the Indiana Network for Patient Care regional data).
Methods
UNAFIED-10 was validated and optimized using Optum de-identified electronic health record data set. AF diagnoses were recorded in the January 2018–December 2019 period (outcome period), with January 2016–December 2017 as the baseline period. Validation cohorts (patients with AF and non-AF controls, aged ≥40 years) comprised the full imbalanced and randomly sampled balanced data sets. Model performance and bias in patient subpopulations based on sex, insurance, race, and region were evaluated.
Results
Of the 6,058,657 eligible patients (mean age 60 ± 12 years), 4.1% (n = 246,975) had their first AF diagnosis within the outcome period. The validated UNAFIED-10 model achieved a higher C-statistic (0.85 [95% confidence interval 0.85–0.86] vs 0.81 [0.80–0.81]) and sensitivity (86% vs 74%) but lower specificity (66% vs 74%) than the original UNAFIED-10 model. During retraining and optimization, the variables insurance, shock, and albumin were excluded to address bias and improve generalizability. This generated an 8-variable model (UNAFIED-8) with consistent performance.
Conclusion
UNAFIED-10, developed using regional patient data, displayed consistent performance in a large national data set. UNAFIED-8 is more parsimonious and generalizable for using advanced analytics for AF detection. Future directions include validation on additional data sets.