How Effective Are Machine Learning and Doubly Robust Estimators in Incorporating High-Dimensional Proxies to Reduce Residual Confounding?

IF 2.4 4区 医学 Q3 PHARMACOLOGY & PHARMACY
Mohammad Ehsanul Karim, Yang Lei
{"title":"How Effective Are Machine Learning and Doubly Robust Estimators in Incorporating High-Dimensional Proxies to Reduce Residual Confounding?","authors":"Mohammad Ehsanul Karim, Yang Lei","doi":"10.1002/pds.70155","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Residual confounding presents a persistent challenge in observational studies, particularly in high-dimensional settings. High-dimensional proxy adjustment methods, such as the high-dimensional propensity score (hdPS), are widely used to address confounding bias by incorporating proxies for unmeasured confounders. Extensions of hdPS have integrated machine learning, such as LASSO and super learner (SL), and doubly robust estimators, such as targeted maximum likelihood estimation (TMLE). However, the comparative performance of these methods, especially under different learner configurations and high-dimensional proxies, remains unclear.</p><p><strong>Method: </strong>We conducted plasmode simulations to evaluate the performance of standard methods, SL, TMLE, and double cross-fit TMLE (DC-TMLE) under varying exposure and outcome prevalence scenarios. Learner libraries included: 1 learner (logistic regression), 3 learners (logistic regression, MARS, and LASSO), and 4 learners (adding XGBoost, a non-Donsker learner). Metrics included bias, coverage, and variability.</p><p><strong>Results: </strong>Methods without proxies exhibited the highest bias and poorest coverage, highlighting the critical role of proxies in confounding adjustment. Standard methods incorporating high-dimensional proxies showed robust performance, achieving low bias and near-nominal coverage. TMLE and DC-TMLE reduced bias but exhibited worse coverage compared to standard methods, particularly with larger learner libraries. Notably, DC-TMLE, expected to address under-coverage issues, failed to perform adequately in high-dimensional settings with non-Donsker learners, further emphasizing the instability introduced by complex libraries.</p><p><strong>Conclusion: </strong>Our findings underscore the utility of high-dimensional proxies in standard methods and the importance of tailoring learner configurations in SL and TMLE to ensure reliable confounding adjustment in high-dimensional contexts.</p>","PeriodicalId":19782,"journal":{"name":"Pharmacoepidemiology and Drug Safety","volume":"34 5","pages":"e70155"},"PeriodicalIF":2.4000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076102/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pharmacoepidemiology and Drug Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/pds.70155","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Residual confounding presents a persistent challenge in observational studies, particularly in high-dimensional settings. High-dimensional proxy adjustment methods, such as the high-dimensional propensity score (hdPS), are widely used to address confounding bias by incorporating proxies for unmeasured confounders. Extensions of hdPS have integrated machine learning, such as LASSO and super learner (SL), and doubly robust estimators, such as targeted maximum likelihood estimation (TMLE). However, the comparative performance of these methods, especially under different learner configurations and high-dimensional proxies, remains unclear.

Method: We conducted plasmode simulations to evaluate the performance of standard methods, SL, TMLE, and double cross-fit TMLE (DC-TMLE) under varying exposure and outcome prevalence scenarios. Learner libraries included: 1 learner (logistic regression), 3 learners (logistic regression, MARS, and LASSO), and 4 learners (adding XGBoost, a non-Donsker learner). Metrics included bias, coverage, and variability.

Results: Methods without proxies exhibited the highest bias and poorest coverage, highlighting the critical role of proxies in confounding adjustment. Standard methods incorporating high-dimensional proxies showed robust performance, achieving low bias and near-nominal coverage. TMLE and DC-TMLE reduced bias but exhibited worse coverage compared to standard methods, particularly with larger learner libraries. Notably, DC-TMLE, expected to address under-coverage issues, failed to perform adequately in high-dimensional settings with non-Donsker learners, further emphasizing the instability introduced by complex libraries.

Conclusion: Our findings underscore the utility of high-dimensional proxies in standard methods and the importance of tailoring learner configurations in SL and TMLE to ensure reliable confounding adjustment in high-dimensional contexts.

机器学习和双鲁棒估计在结合高维代理以减少残留混淆方面有多有效?
背景:在观察性研究中,特别是在高维环境中,残留混淆提出了一个持续的挑战。高维代理调整方法,如高维倾向评分(hdPS),被广泛用于通过纳入未测量混杂因素的代理来解决混杂偏差。hdPS的扩展集成了机器学习,如LASSO和超级学习者(SL),以及双重鲁棒估计,如目标最大似然估计(TMLE)。然而,这些方法的比较性能,特别是在不同的学习者配置和高维代理下,仍然不清楚。方法:我们通过等离子体模型模拟来评估标准方法、SL、TMLE和双交叉拟合TMLE (DC-TMLE)在不同暴露和结果流行情景下的性能。学习器库包括:1个学习器(逻辑回归),3个学习器(逻辑回归,MARS和LASSO)和4个学习器(添加XGBoost,非donsker学习器)。指标包括偏差、覆盖率和可变性。结果:没有代理的方法显示出最大的偏倚和最低的覆盖率,突出了代理在混淆调整中的关键作用。采用高维代理的标准方法表现出稳健的性能,实现了低偏差和接近名义覆盖率。与标准方法相比,TMLE和DC-TMLE减少了偏差,但表现出更差的覆盖率,特别是在较大的学习者库中。值得注意的是,DC-TMLE有望解决覆盖不足的问题,但在非donsker学习者的高维环境中表现不佳,进一步强调了复杂库引入的不稳定性。结论:我们的研究结果强调了高维代理在标准方法中的效用,以及在SL和TMLE中定制学习者配置以确保高维背景下可靠的混淆调整的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.80
自引率
7.70%
发文量
173
审稿时长
3 months
期刊介绍: The aim of Pharmacoepidemiology and Drug Safety is to provide an international forum for the communication and evaluation of data, methods and opinion in the discipline of pharmacoepidemiology. The Journal publishes peer-reviewed reports of original research, invited reviews and a variety of guest editorials and commentaries embracing scientific, medical, statistical, legal and economic aspects of pharmacoepidemiology and post-marketing surveillance of drug safety. Appropriate material in these categories may also be considered for publication as a Brief Report. Particular areas of interest include: design, analysis, results, and interpretation of studies looking at the benefit or safety of specific pharmaceuticals, biologics, or medical devices, including studies in pharmacovigilance, postmarketing surveillance, pharmacoeconomics, patient safety, molecular pharmacoepidemiology, or any other study within the broad field of pharmacoepidemiology; comparative effectiveness research relating to pharmaceuticals, biologics, and medical devices. Comparative effectiveness research is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition, as these methods are truly used in the real world; methodologic contributions of relevance to pharmacoepidemiology, whether original contributions, reviews of existing methods, or tutorials for how to apply the methods of pharmacoepidemiology; assessments of harm versus benefit in drug therapy; patterns of drug utilization; relationships between pharmacoepidemiology and the formulation and interpretation of regulatory guidelines; evaluations of risk management plans and programmes relating to pharmaceuticals, biologics and medical devices.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信