Causal machine learning for heterogeneous treatment effects in the presence of missing outcome data.

IF 1.7 4区数学 Q3 BIOLOGY

Biometrics Pub Date : 2025-07-03 DOI:10.1093/biomtc/ujaf098

Matthew Pryce, Karla Diaz-Ordaz, Ruth H Keogh, Stijn Vansteelandt

{"title":"Causal machine learning for heterogeneous treatment effects in the presence of missing outcome data.","authors":"Matthew Pryce, Karla Diaz-Ordaz, Ruth H Keogh, Stijn Vansteelandt","doi":"10.1093/biomtc/ujaf098","DOIUrl":null,"url":null,"abstract":"<p><p>When estimating heterogeneous treatment effects, missing outcome data can complicate treatment effect estimation, causing certain subgroups of the population to be poorly represented. In this work, we discuss this commonly overlooked problem and consider the impact that missing at random outcome data has on causal machine learning estimators for the conditional average treatment effect (CATE). We propose 2 de-biased machine learning estimators for the CATE, the mDR-learner, and mEP-learner, which address the issue of under-representation by integrating inverse probability of censoring weights into the DR-learner and EP-learner, respectively. We show that under reasonable conditions, these estimators are oracle efficient and illustrate their favorable performance through simulated data settings, comparing them to existing CATE estimators, including comparison to estimators that use common missing data techniques. We present an example of their application using the GBSG2 trial, exploring treatment effect heterogeneity when comparing hormonal therapies to non-hormonal therapies among breast cancer patients post surgery, and offer guidance on the decisions a practitioner must make when implementing these estimators.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomtc/ujaf098","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

When estimating heterogeneous treatment effects, missing outcome data can complicate treatment effect estimation, causing certain subgroups of the population to be poorly represented. In this work, we discuss this commonly overlooked problem and consider the impact that missing at random outcome data has on causal machine learning estimators for the conditional average treatment effect (CATE). We propose 2 de-biased machine learning estimators for the CATE, the mDR-learner, and mEP-learner, which address the issue of under-representation by integrating inverse probability of censoring weights into the DR-learner and EP-learner, respectively. We show that under reasonable conditions, these estimators are oracle efficient and illustrate their favorable performance through simulated data settings, comparing them to existing CATE estimators, including comparison to estimators that use common missing data techniques. We present an example of their application using the GBSG2 trial, exploring treatment effect heterogeneity when comparing hormonal therapies to non-hormonal therapies among breast cancer patients post surgery, and offer guidance on the decisions a practitioner must make when implementing these estimators.

查看原文本刊更多论文

在缺少结果数据的情况下，异质性治疗效果的因果机器学习。

在估计异质性治疗效果时，缺少结局数据会使治疗效果估计复杂化，导致人群的某些亚组代表性不足。在这项工作中，我们讨论了这个经常被忽视的问题，并考虑了随机结果数据缺失对条件平均处理效果（CATE）的因果机器学习估计器的影响。我们为CATE， mDR-learner和mEP-learner提出了2个去偏机器学习估计器，它们分别通过将审查权的逆概率集成到DR-learner和EP-learner中来解决代表性不足的问题。我们表明，在合理的条件下，这些估计器是oracle高效的，并通过模拟数据设置说明它们的良好性能，将它们与现有的CATE估计器进行比较，包括与使用常见缺失数据技术的估计器进行比较。我们在GBSG2试验中展示了它们的应用实例，在比较乳腺癌术后患者的激素治疗与非激素治疗时，探索治疗效果的异质性，并为医生在实施这些评估时必须做出的决定提供指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biometrics 生物-生物学

CiteScore

2.70

自引率

5.30%

发文量

178

审稿时长

4-8 weeks

期刊介绍： The International Biometric Society is an international society promoting the development and application of statistical and mathematical theory and methods in the biosciences, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry, and allied disciplines. The Society welcomes as members statisticians, mathematicians, biological scientists, and others devoted to interdisciplinary efforts in advancing the collection and interpretation of information in the biosciences. The Society sponsors the biennial International Biometric Conference, held in sites throughout the world; through its National Groups and Regions, it also Society sponsors regional and local meetings.