Evaluation of Machine Learning-Based Propensity Score Estimation: A Benchmarking Observational Analysis Against a Randomized Trial.

Kaicheng Wang, Lindsey Rosman, Haidong Lu
{"title":"Evaluation of Machine Learning-Based Propensity Score Estimation: A Benchmarking Observational Analysis Against a Randomized Trial.","authors":"Kaicheng Wang, Lindsey Rosman, Haidong Lu","doi":"10.1101/2025.06.16.25329708","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning (ML) approaches for propensity score estimation are increasingly used with the expectation of improving covariate balance and reducing bias, but their validity in selecting appropriate confounders remains controversial. In this study, we estimated the effectiveness of sacubitril/valsartan versus angiotensin-converting enzyme inhibitor and angiotensin receptor blocker on all-cause mortality among heart failure patients with implantable cardioverter defibrillators in the U.S. Department of Veterans Affairs from 2016 to 2020. We compared results from traditional logistic regression- and ML-based propensity score methods and benchmarked them against the PARADIGM-HF randomized trial. The estimate from logistic regression with <i>a priori</i> confounder selection (HR = 0.93, 95% CI 0.61 - 1.42; 27-month RR = 0.87, 95% CI 0.59 - 1.21) most closely aligned with the trial result (HR = 0.81; 95% CI 0.61 - 1.06). In contrast, generalized boosting models did not outperform traditional logistic regression, and may amplify bias when combined with a data-driven confounder selection (HR = 0.63, 95% CI 0.31 - 1.30; RR = 0.61, 95% CI 0.33 - 1.04). Our findings suggest that ML-based propensity scores may introduce overadjustment bias and underscore the importance of subject-matter knowledge in causal inference with high-dimensional real-world data.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204248/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv : the preprint server for health sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2025.06.16.25329708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) approaches for propensity score estimation are increasingly used with the expectation of improving covariate balance and reducing bias, but their validity in selecting appropriate confounders remains controversial. In this study, we estimated the effectiveness of sacubitril/valsartan versus angiotensin-converting enzyme inhibitor and angiotensin receptor blocker on all-cause mortality among heart failure patients with implantable cardioverter defibrillators in the U.S. Department of Veterans Affairs from 2016 to 2020. We compared results from traditional logistic regression- and ML-based propensity score methods and benchmarked them against the PARADIGM-HF randomized trial. The estimate from logistic regression with a priori confounder selection (HR = 0.93, 95% CI 0.61 - 1.42; 27-month RR = 0.87, 95% CI 0.59 - 1.21) most closely aligned with the trial result (HR = 0.81; 95% CI 0.61 - 1.06). In contrast, generalized boosting models did not outperform traditional logistic regression, and may amplify bias when combined with a data-driven confounder selection (HR = 0.63, 95% CI 0.31 - 1.30; RR = 0.61, 95% CI 0.33 - 1.04). Our findings suggest that ML-based propensity scores may introduce overadjustment bias and underscore the importance of subject-matter knowledge in causal inference with high-dimensional real-world data.

基于机器学习的倾向评分评估:针对随机试验的基准观察分析。
机器学习(ML)倾向评分估计方法越来越多地用于改善协变量平衡和减少偏差的期望,但它们在选择适当混杂因素方面的有效性仍然存在争议。在这项研究中,我们评估了2016年至2020年美国退伍军人事务部植入式心律转复除颤器心力衰竭患者中,苏比利/缬沙坦与血管紧张素转换酶抑制剂和血管紧张素受体阻滞剂对全因死亡率的影响。我们比较了传统逻辑回归和基于ml的倾向评分方法的结果,并将其与PARADIGM-HF随机试验进行了比较。采用先验混杂因素选择的logistic回归估计(HR = 0.93, 95% CI 0.61 - 1.42;27个月RR = 0.87, 95% CI 0.59 - 1.21)与试验结果最接近(HR = 0.81;95% ci 0.61 - 1.06)。相比之下,广义增强模型的表现并不优于传统的逻辑回归,当与数据驱动的混杂因素选择相结合时,可能会放大偏差(HR = 0.63, 95% CI 0.31 - 1.30;Rr = 0.61, 95% ci 0.33 - 1.04)。我们的研究结果表明,基于ml的倾向评分可能会引入过度调整偏差,并强调了主题知识在高维现实世界数据因果推理中的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信