{"title":"估计因果效应的算法选择:使用无产妊娠结局研究的一个例子:监测准妈妈。","authors":"Zhaohua Zeng, Lisa M Bodnar, Ashley I Naimi","doi":"10.1097/EDE.0000000000001906","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The Super Learner is an ensemble learning method that has been widely used with doubly robust causal effect estimators. It is recommended to deploy the Super Learner with a diverse library of algorithms. To our knowledge, however, the magnitude of the improvements gained by including many algorithms has not yet been systematically evaluated in common epidemiologic research settings.</p><p><strong>Methods: </strong>We applied Super Learning with two doubly robust estimators, augmented inverse probability weighting (AIPW) and targeted minimum loss-based estimation (TMLE), to estimate the average treatment effect (ATE) of high periconceptional dietary fruit and vegetable density on the risk of preeclampsia among 7,923 women from the nuMoM2b study. Using a reference ensemble with a diverse library of algorithms, we compared estimates under different sets of algorithms included in the Super Learner to evaluate whether ATE estimates were sensitive to library choices.</p><p><strong>Results: </strong>The doubly robust estimators fitted with the reference Super Learner ensemble suggested ≥2.5 cups/1,000 kcal of total fruit and vegetable density was associated with a lower risk of preeclampsia. ATE estimated on the risk difference scale by AIPW was -0.019 (95% confidence interval = -0.036, -0.003) and by TMLE was -0.023 (95% confidence interval = -0.039, -0.007). Excluding any individual algorithm from the reference ensemble had little impact on estimates from either AIPW or TMLE. However, relying on a single algorithm (e.g., extreme gradient boosting) yielded results that were much more variable.</p><p><strong>Conclusion: </strong>Our empirical findings support recommendations to build ensemble learners for doubly robust estimators using a diverse array of flexible machine learning algorithms.</p>","PeriodicalId":11779,"journal":{"name":"Epidemiology","volume":" ","pages":"760-768"},"PeriodicalIF":4.4000,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Algorithm Selection for Estimating Causal Effects: Nulliparous Pregnancy Outcomes Study: Monitoring Mothers to Be.\",\"authors\":\"Zhaohua Zeng, Lisa M Bodnar, Ashley I Naimi\",\"doi\":\"10.1097/EDE.0000000000001906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The Super Learner is an ensemble learning method that has been widely used with doubly robust causal effect estimators. It is recommended to deploy the Super Learner with a diverse library of algorithms. To our knowledge, however, the magnitude of the improvements gained by including many algorithms has not yet been systematically evaluated in common epidemiologic research settings.</p><p><strong>Methods: </strong>We applied Super Learning with two doubly robust estimators, augmented inverse probability weighting (AIPW) and targeted minimum loss-based estimation (TMLE), to estimate the average treatment effect (ATE) of high periconceptional dietary fruit and vegetable density on the risk of preeclampsia among 7,923 women from the nuMoM2b study. Using a reference ensemble with a diverse library of algorithms, we compared estimates under different sets of algorithms included in the Super Learner to evaluate whether ATE estimates were sensitive to library choices.</p><p><strong>Results: </strong>The doubly robust estimators fitted with the reference Super Learner ensemble suggested ≥2.5 cups/1,000 kcal of total fruit and vegetable density was associated with a lower risk of preeclampsia. ATE estimated on the risk difference scale by AIPW was -0.019 (95% confidence interval = -0.036, -0.003) and by TMLE was -0.023 (95% confidence interval = -0.039, -0.007). Excluding any individual algorithm from the reference ensemble had little impact on estimates from either AIPW or TMLE. However, relying on a single algorithm (e.g., extreme gradient boosting) yielded results that were much more variable.</p><p><strong>Conclusion: </strong>Our empirical findings support recommendations to build ensemble learners for doubly robust estimators using a diverse array of flexible machine learning algorithms.</p>\",\"PeriodicalId\":11779,\"journal\":{\"name\":\"Epidemiology\",\"volume\":\" \",\"pages\":\"760-768\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/EDE.0000000000001906\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/EDE.0000000000001906","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
Algorithm Selection for Estimating Causal Effects: Nulliparous Pregnancy Outcomes Study: Monitoring Mothers to Be.
Background: The Super Learner is an ensemble learning method that has been widely used with doubly robust causal effect estimators. It is recommended to deploy the Super Learner with a diverse library of algorithms. To our knowledge, however, the magnitude of the improvements gained by including many algorithms has not yet been systematically evaluated in common epidemiologic research settings.
Methods: We applied Super Learning with two doubly robust estimators, augmented inverse probability weighting (AIPW) and targeted minimum loss-based estimation (TMLE), to estimate the average treatment effect (ATE) of high periconceptional dietary fruit and vegetable density on the risk of preeclampsia among 7,923 women from the nuMoM2b study. Using a reference ensemble with a diverse library of algorithms, we compared estimates under different sets of algorithms included in the Super Learner to evaluate whether ATE estimates were sensitive to library choices.
Results: The doubly robust estimators fitted with the reference Super Learner ensemble suggested ≥2.5 cups/1,000 kcal of total fruit and vegetable density was associated with a lower risk of preeclampsia. ATE estimated on the risk difference scale by AIPW was -0.019 (95% confidence interval = -0.036, -0.003) and by TMLE was -0.023 (95% confidence interval = -0.039, -0.007). Excluding any individual algorithm from the reference ensemble had little impact on estimates from either AIPW or TMLE. However, relying on a single algorithm (e.g., extreme gradient boosting) yielded results that were much more variable.
Conclusion: Our empirical findings support recommendations to build ensemble learners for doubly robust estimators using a diverse array of flexible machine learning algorithms.
期刊介绍:
Epidemiology publishes original research from all fields of epidemiology. The journal also welcomes review articles and meta-analyses, novel hypotheses, descriptions and applications of new methods, and discussions of research theory or public health policy. We give special consideration to papers from developing countries.