Counterfactual prediction from machine learning models: transportability and joint analysis for model development and evaluation using multi-source data.

IF 2.6

Diagnostic and prognostic research Pub Date : 2025-10-02 DOI:10.1186/s41512-025-00201-y

Sarah C Voter, Issa J Dahabreh, Christopher B Boyer, Habib Rahbar, Despina Kontos, Jon A Steingrimsson

{"title":"Counterfactual prediction from machine learning models: transportability and joint analysis for model development and evaluation using multi-source data.","authors":"Sarah C Voter, Issa J Dahabreh, Christopher B Boyer, Habib Rahbar, Despina Kontos, Jon A Steingrimsson","doi":"10.1186/s41512-025-00201-y","DOIUrl":null,"url":null,"abstract":"Background: When a machine learning model is developed and evaluated in a setting where the treatment assignment process differs from the setting of intended model deployment, failure to account for this difference can lead to suboptimal model development and biased estimates of model performance.Methods: We consider the setting where data from a randomized trial and an observational study emulating the trial are available for machine learning model development and evaluation. We provide two approaches for estimating the model and assessing model performance under a hypothetical treatment strategy in the target population underlying the observational study. The first approach uses counterfactual predictions from the observational study only and relies on the assumption of conditional exchangeability between treated and untreated individuals (no unmeasured confounding). The second approach leverages the exchangeability between treatment groups in the trial (supported by study design) to \"transport\" estimates from the trial to the population underlying the observational study, relying on an additional assumption of conditional exchangeability between the populations underlying the observational study and the randomized trial.Results: We examine the assumptions underlying both approaches for fitting the model and estimating performance in the target population and provide estimators for both objectives. We then develop a joint estimation strategy that combines data from the trial and the observational study, and discuss benchmarking of the trial and observational results.Conclusions: Both the observational and transportability analyses can be used to fit a model and estimate performance under a counterfactual treatment strategy in the population underlying the observational data, but they rely on different assumptions. In either case, the assumptions are untestable, and deciding which method is more appropriate requires careful contextual consideration. If all assumptions hold, then combining the data from the observational study and the randomized trial can be used for more efficient estimation.","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"22"},"PeriodicalIF":2.6000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490139/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and prognostic research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41512-025-00201-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: When a machine learning model is developed and evaluated in a setting where the treatment assignment process differs from the setting of intended model deployment, failure to account for this difference can lead to suboptimal model development and biased estimates of model performance.

Methods: We consider the setting where data from a randomized trial and an observational study emulating the trial are available for machine learning model development and evaluation. We provide two approaches for estimating the model and assessing model performance under a hypothetical treatment strategy in the target population underlying the observational study. The first approach uses counterfactual predictions from the observational study only and relies on the assumption of conditional exchangeability between treated and untreated individuals (no unmeasured confounding). The second approach leverages the exchangeability between treatment groups in the trial (supported by study design) to "transport" estimates from the trial to the population underlying the observational study, relying on an additional assumption of conditional exchangeability between the populations underlying the observational study and the randomized trial.

Results: We examine the assumptions underlying both approaches for fitting the model and estimating performance in the target population and provide estimators for both objectives. We then develop a joint estimation strategy that combines data from the trial and the observational study, and discuss benchmarking of the trial and observational results.

Conclusions: Both the observational and transportability analyses can be used to fit a model and estimate performance under a counterfactual treatment strategy in the population underlying the observational data, but they rely on different assumptions. In either case, the assumptions are untestable, and deciding which method is more appropriate requires careful contextual consideration. If all assumptions hold, then combining the data from the observational study and the randomized trial can be used for more efficient estimation.

查看原文本刊更多论文

机器学习模型的反事实预测：使用多源数据进行模型开发和评估的可移植性和联合分析。

背景：当机器学习模型在处理分配过程与预期模型部署设置不同的环境中开发和评估时，未能考虑到这种差异可能导致模型开发次优和模型性能估计偏差。方法：我们考虑随机试验和模拟试验的观察性研究的数据可用于机器学习模型开发和评估的设置。我们提供了两种方法来估计模型和评估模型在观察性研究的目标人群中假设治疗策略下的性能。第一种方法仅使用来自观察性研究的反事实预测，并依赖于治疗个体和未治疗个体之间条件互换性的假设（没有未测量的混淆）。第二种方法利用试验中治疗组之间的互换性（由研究设计支持），将估计从试验“传递”到观察性研究的基础人群，依赖于观察性研究和随机试验基础人群之间条件互换性的额外假设。结果：我们检验了拟合模型和估计目标人群表现的两种方法的假设，并为这两个目标提供了估计器。然后，我们开发了一种联合估计策略，结合了试验和观察性研究的数据，并讨论了试验和观察结果的基准。结论：观察性分析和可转运性分析都可以用来拟合模型，并在观察数据基础上的人群中估计反事实治疗策略下的表现，但它们依赖于不同的假设。在任何一种情况下，假设都是不可测试的，并且决定哪种方法更合适需要仔细考虑上下文。如果所有的假设都成立，那么结合观察性研究和随机试验的数据可以用于更有效的估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Diagnostic and prognostic research

自引率

0.00%

发文量

审稿时长

18 weeks