Mohammad Ateya PharmD, MS , Danai Aristeridou MSc , George H. Sands MD , Jessica Zielinski BS , Randall W. Grout MD, MS , A. Carmine Colavecchia PharmD, PhD , Oussama Wazni MD, FHRS , Saira N. Haque PhD, MHSA
{"title":"使用国家电子健康数据的UNAFIED 2年未确诊房颤风险预测模型的验证、偏倚评估和优化","authors":"Mohammad Ateya PharmD, MS , Danai Aristeridou MSc , George H. Sands MD , Jessica Zielinski BS , Randall W. Grout MD, MS , A. Carmine Colavecchia PharmD, PhD , Oussama Wazni MD, FHRS , Saira N. Haque PhD, MHSA","doi":"10.1016/j.hroo.2024.09.010","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Prediction models for atrial fibrillation (AF) may enable earlier detection and guideline-directed treatment decisions. However, model bias may lead to inaccurate predictions and unintended consequences.</div></div><div><h3>Objective</h3><div>The purpose of this study was to validate, assess bias, and improve generalizability of “UNAFIED-10,” a 2-year, 10-variable predictive model of undiagnosed AF in a national data set (originally developed using the Indiana Network for Patient Care regional data).</div></div><div><h3>Methods</h3><div>UNAFIED-10 was validated and optimized using Optum de-identified electronic health record data set. AF diagnoses were recorded in the January 2018–December 2019 period (outcome period), with January 2016–December 2017 as the baseline period. Validation cohorts (patients with AF and non-AF controls, aged ≥40 years) comprised the full imbalanced and randomly sampled balanced data sets. Model performance and bias in patient subpopulations based on sex, insurance, race, and region were evaluated.</div></div><div><h3>Results</h3><div>Of the 6,058,657 eligible patients (mean age 60 ± 12 years), 4.1% (n = 246,975) had their first AF diagnosis within the outcome period. The validated UNAFIED-10 model achieved a higher C-statistic (0.85 [95% confidence interval 0.85–0.86] vs 0.81 [0.80–0.81]) and sensitivity (86% vs 74%) but lower specificity (66% vs 74%) than the original UNAFIED-10 model. During retraining and optimization, the variables insurance, shock, and albumin were excluded to address bias and improve generalizability. This generated an 8-variable model (UNAFIED-8) with consistent performance.</div></div><div><h3>Conclusion</h3><div>UNAFIED-10, developed using regional patient data, displayed consistent performance in a large national data set. UNAFIED-8 is more parsimonious and generalizable for using advanced analytics for AF detection. Future directions include validation on additional data sets.</div></div>","PeriodicalId":29772,"journal":{"name":"Heart Rhythm O2","volume":"5 12","pages":"Pages 925-935"},"PeriodicalIF":2.5000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11721729/pdf/","citationCount":"0","resultStr":"{\"title\":\"Validation, bias assessment, and optimization of the UNAFIED 2-year risk prediction model for undiagnosed atrial fibrillation using national electronic health data\",\"authors\":\"Mohammad Ateya PharmD, MS , Danai Aristeridou MSc , George H. Sands MD , Jessica Zielinski BS , Randall W. Grout MD, MS , A. Carmine Colavecchia PharmD, PhD , Oussama Wazni MD, FHRS , Saira N. Haque PhD, MHSA\",\"doi\":\"10.1016/j.hroo.2024.09.010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Prediction models for atrial fibrillation (AF) may enable earlier detection and guideline-directed treatment decisions. However, model bias may lead to inaccurate predictions and unintended consequences.</div></div><div><h3>Objective</h3><div>The purpose of this study was to validate, assess bias, and improve generalizability of “UNAFIED-10,” a 2-year, 10-variable predictive model of undiagnosed AF in a national data set (originally developed using the Indiana Network for Patient Care regional data).</div></div><div><h3>Methods</h3><div>UNAFIED-10 was validated and optimized using Optum de-identified electronic health record data set. AF diagnoses were recorded in the January 2018–December 2019 period (outcome period), with January 2016–December 2017 as the baseline period. Validation cohorts (patients with AF and non-AF controls, aged ≥40 years) comprised the full imbalanced and randomly sampled balanced data sets. Model performance and bias in patient subpopulations based on sex, insurance, race, and region were evaluated.</div></div><div><h3>Results</h3><div>Of the 6,058,657 eligible patients (mean age 60 ± 12 years), 4.1% (n = 246,975) had their first AF diagnosis within the outcome period. The validated UNAFIED-10 model achieved a higher C-statistic (0.85 [95% confidence interval 0.85–0.86] vs 0.81 [0.80–0.81]) and sensitivity (86% vs 74%) but lower specificity (66% vs 74%) than the original UNAFIED-10 model. During retraining and optimization, the variables insurance, shock, and albumin were excluded to address bias and improve generalizability. This generated an 8-variable model (UNAFIED-8) with consistent performance.</div></div><div><h3>Conclusion</h3><div>UNAFIED-10, developed using regional patient data, displayed consistent performance in a large national data set. UNAFIED-8 is more parsimonious and generalizable for using advanced analytics for AF detection. Future directions include validation on additional data sets.</div></div>\",\"PeriodicalId\":29772,\"journal\":{\"name\":\"Heart Rhythm O2\",\"volume\":\"5 12\",\"pages\":\"Pages 925-935\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11721729/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Heart Rhythm O2\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666501824003015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Heart Rhythm O2","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666501824003015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
背景:房颤(AF)的预测模型可以使早期发现和指导治疗决策成为可能。然而,模型偏差可能导致不准确的预测和意想不到的后果。目的:本研究的目的是验证、评估偏倚,并提高“unafed -10”的通用性,unafed -10是一个国家数据集(最初使用印第安纳州患者护理网络区域数据开发)中未确诊房颤的2年、10变量预测模型。方法:利用Optum去识别电子病历数据集对unafi -10进行验证和优化。房颤诊断记录于2018年1月至2019年12月期间(结果期),2016年1月至2017年12月为基线期。验证队列(房颤患者和非房颤对照组,年龄≥40岁)包括完全不平衡和随机抽样的平衡数据集。模型在基于性别、保险、种族和地区的患者亚群中的表现和偏倚进行了评估。结果:在6058657例符合条件的患者(平均年龄60±12岁)中,4.1% (n = 246975)在结果期内首次诊断为房颤。与原始unafi -10模型相比,经验证的unafi -10模型具有更高的c统计量(0.85[95%置信区间0.85-0.86]vs 0.81[0.80-0.81])和灵敏度(86% vs 74%),但特异性较低(66% vs 74%)。在再训练和优化过程中,排除了保险、休克和白蛋白等变量,以解决偏差并提高通用性。这产生了一个具有一致性能的8变量模型(unafi -8)。结论:使用区域患者数据开发的unafi -10在大型国家数据集中显示出一致的性能。unafi -8对于使用先进的自动对焦检测分析更加简洁和一般化。未来的方向包括对其他数据集的验证。
Validation, bias assessment, and optimization of the UNAFIED 2-year risk prediction model for undiagnosed atrial fibrillation using national electronic health data
Background
Prediction models for atrial fibrillation (AF) may enable earlier detection and guideline-directed treatment decisions. However, model bias may lead to inaccurate predictions and unintended consequences.
Objective
The purpose of this study was to validate, assess bias, and improve generalizability of “UNAFIED-10,” a 2-year, 10-variable predictive model of undiagnosed AF in a national data set (originally developed using the Indiana Network for Patient Care regional data).
Methods
UNAFIED-10 was validated and optimized using Optum de-identified electronic health record data set. AF diagnoses were recorded in the January 2018–December 2019 period (outcome period), with January 2016–December 2017 as the baseline period. Validation cohorts (patients with AF and non-AF controls, aged ≥40 years) comprised the full imbalanced and randomly sampled balanced data sets. Model performance and bias in patient subpopulations based on sex, insurance, race, and region were evaluated.
Results
Of the 6,058,657 eligible patients (mean age 60 ± 12 years), 4.1% (n = 246,975) had their first AF diagnosis within the outcome period. The validated UNAFIED-10 model achieved a higher C-statistic (0.85 [95% confidence interval 0.85–0.86] vs 0.81 [0.80–0.81]) and sensitivity (86% vs 74%) but lower specificity (66% vs 74%) than the original UNAFIED-10 model. During retraining and optimization, the variables insurance, shock, and albumin were excluded to address bias and improve generalizability. This generated an 8-variable model (UNAFIED-8) with consistent performance.
Conclusion
UNAFIED-10, developed using regional patient data, displayed consistent performance in a large national data set. UNAFIED-8 is more parsimonious and generalizable for using advanced analytics for AF detection. Future directions include validation on additional data sets.