估计因果效应的算法选择:使用无产妊娠结局研究的一个例子:监测准妈妈。

IF 4.4 2区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Epidemiology Pub Date : 2025-11-01 Epub Date: 2025-08-15 DOI:10.1097/EDE.0000000000001906
Zhaohua Zeng, Lisa M Bodnar, Ashley I Naimi
{"title":"估计因果效应的算法选择:使用无产妊娠结局研究的一个例子:监测准妈妈。","authors":"Zhaohua Zeng, Lisa M Bodnar, Ashley I Naimi","doi":"10.1097/EDE.0000000000001906","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The Super Learner is an ensemble learning method that has been widely used with doubly robust causal effect estimators. It is recommended to deploy the Super Learner with a diverse library of algorithms. To our knowledge, however, the magnitude of the improvements gained by including many algorithms has not yet been systematically evaluated in common epidemiologic research settings.</p><p><strong>Methods: </strong>We applied Super Learning with two doubly robust estimators, augmented inverse probability weighting (AIPW) and targeted minimum loss-based estimation (TMLE), to estimate the average treatment effect (ATE) of high periconceptional dietary fruit and vegetable density on the risk of preeclampsia among 7,923 women from the nuMoM2b study. Using a reference ensemble with a diverse library of algorithms, we compared estimates under different sets of algorithms included in the Super Learner to evaluate whether ATE estimates were sensitive to library choices.</p><p><strong>Results: </strong>The doubly robust estimators fitted with the reference Super Learner ensemble suggested ≥2.5 cups/1,000 kcal of total fruit and vegetable density was associated with a lower risk of preeclampsia. ATE estimated on the risk difference scale by AIPW was -0.019 (95% confidence interval = -0.036, -0.003) and by TMLE was -0.023 (95% confidence interval = -0.039, -0.007). Excluding any individual algorithm from the reference ensemble had little impact on estimates from either AIPW or TMLE. However, relying on a single algorithm (e.g., extreme gradient boosting) yielded results that were much more variable.</p><p><strong>Conclusion: </strong>Our empirical findings support recommendations to build ensemble learners for doubly robust estimators using a diverse array of flexible machine learning algorithms.</p>","PeriodicalId":11779,"journal":{"name":"Epidemiology","volume":" ","pages":"760-768"},"PeriodicalIF":4.4000,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Algorithm Selection for Estimating Causal Effects: Nulliparous Pregnancy Outcomes Study: Monitoring Mothers to Be.\",\"authors\":\"Zhaohua Zeng, Lisa M Bodnar, Ashley I Naimi\",\"doi\":\"10.1097/EDE.0000000000001906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The Super Learner is an ensemble learning method that has been widely used with doubly robust causal effect estimators. It is recommended to deploy the Super Learner with a diverse library of algorithms. To our knowledge, however, the magnitude of the improvements gained by including many algorithms has not yet been systematically evaluated in common epidemiologic research settings.</p><p><strong>Methods: </strong>We applied Super Learning with two doubly robust estimators, augmented inverse probability weighting (AIPW) and targeted minimum loss-based estimation (TMLE), to estimate the average treatment effect (ATE) of high periconceptional dietary fruit and vegetable density on the risk of preeclampsia among 7,923 women from the nuMoM2b study. Using a reference ensemble with a diverse library of algorithms, we compared estimates under different sets of algorithms included in the Super Learner to evaluate whether ATE estimates were sensitive to library choices.</p><p><strong>Results: </strong>The doubly robust estimators fitted with the reference Super Learner ensemble suggested ≥2.5 cups/1,000 kcal of total fruit and vegetable density was associated with a lower risk of preeclampsia. ATE estimated on the risk difference scale by AIPW was -0.019 (95% confidence interval = -0.036, -0.003) and by TMLE was -0.023 (95% confidence interval = -0.039, -0.007). Excluding any individual algorithm from the reference ensemble had little impact on estimates from either AIPW or TMLE. However, relying on a single algorithm (e.g., extreme gradient boosting) yielded results that were much more variable.</p><p><strong>Conclusion: </strong>Our empirical findings support recommendations to build ensemble learners for doubly robust estimators using a diverse array of flexible machine learning algorithms.</p>\",\"PeriodicalId\":11779,\"journal\":{\"name\":\"Epidemiology\",\"volume\":\" \",\"pages\":\"760-768\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/EDE.0000000000001906\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/EDE.0000000000001906","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

背景:超级学习者是一种集成学习方法,已广泛应用于双鲁棒因果效应估计。建议使用多种算法库来部署超级学习者。然而,据我们所知,包括许多算法所获得的改进幅度尚未在普通流行病学研究环境中进行系统评估。方法:我们应用超级学习和两个双鲁棒估计[增强逆概率加权(AIPW)和基于目标最小损失估计(TMLE)]来估计高围孕期饮食水果和蔬菜密度对nuMoM2b研究中7923名妇女子痫前期风险的平均治疗效果(ATE)。使用具有不同算法库的参考集成,我们比较了超级学习者中包含的不同算法集下的估计,以评估ATE估计是否对库的选择敏感。结果:双稳健估计与参考超级学习者集合拟合表明≥2.5杯/1000千卡的总水果和蔬菜密度与较低的子痫前期风险相关。AIPW估算的ATE风险差异量表为-0.019 (95%CI: -0.036, -0.003), TMLE估算的ATE为-0.023 (95%CI: -0.039, -0.007)。从参考集合中排除任何单个算法对AIPW或TMLE的估计几乎没有影响。然而,依赖于单一算法(例如,极端梯度增强)产生的结果更加多变。结论:我们的实证研究结果支持使用多种灵活的机器学习算法为双鲁棒估计器构建集成学习器的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Algorithm Selection for Estimating Causal Effects: Nulliparous Pregnancy Outcomes Study: Monitoring Mothers to Be.

Background: The Super Learner is an ensemble learning method that has been widely used with doubly robust causal effect estimators. It is recommended to deploy the Super Learner with a diverse library of algorithms. To our knowledge, however, the magnitude of the improvements gained by including many algorithms has not yet been systematically evaluated in common epidemiologic research settings.

Methods: We applied Super Learning with two doubly robust estimators, augmented inverse probability weighting (AIPW) and targeted minimum loss-based estimation (TMLE), to estimate the average treatment effect (ATE) of high periconceptional dietary fruit and vegetable density on the risk of preeclampsia among 7,923 women from the nuMoM2b study. Using a reference ensemble with a diverse library of algorithms, we compared estimates under different sets of algorithms included in the Super Learner to evaluate whether ATE estimates were sensitive to library choices.

Results: The doubly robust estimators fitted with the reference Super Learner ensemble suggested ≥2.5 cups/1,000 kcal of total fruit and vegetable density was associated with a lower risk of preeclampsia. ATE estimated on the risk difference scale by AIPW was -0.019 (95% confidence interval = -0.036, -0.003) and by TMLE was -0.023 (95% confidence interval = -0.039, -0.007). Excluding any individual algorithm from the reference ensemble had little impact on estimates from either AIPW or TMLE. However, relying on a single algorithm (e.g., extreme gradient boosting) yielded results that were much more variable.

Conclusion: Our empirical findings support recommendations to build ensemble learners for doubly robust estimators using a diverse array of flexible machine learning algorithms.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Epidemiology
Epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
6.70
自引率
3.70%
发文量
177
审稿时长
6-12 weeks
期刊介绍: Epidemiology publishes original research from all fields of epidemiology. The journal also welcomes review articles and meta-analyses, novel hypotheses, descriptions and applications of new methods, and discussions of research theory or public health policy. We give special consideration to papers from developing countries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信