Denis Talbot, Awa Diop, Miceline Mésidor, Yohann Chiu, Caroline Sirois, Andrew J Spieker, Antoine Pariente, Pernelle Noize, Marc Simard, Miguel Angel Luque Fernandez, Michael Schomaker, Kenji Fujita, Danijela Gnjidic, Mireille E Schnitzer
{"title":"在估计暴露对事件时间结果的因果影响时,使用目标最大似然和机器学习的指南和最佳实践。","authors":"Denis Talbot, Awa Diop, Miceline Mésidor, Yohann Chiu, Caroline Sirois, Andrew J Spieker, Antoine Pariente, Pernelle Noize, Marc Simard, Miguel Angel Luque Fernandez, Michael Schomaker, Kenji Fujita, Danijela Gnjidic, Mireille E Schnitzer","doi":"10.1002/sim.70034","DOIUrl":null,"url":null,"abstract":"<p><p>Targeted maximum likelihood estimation (TMLE) is an increasingly popular framework for the estimation of causal effects. It requires modeling both the exposure and outcome but is doubly robust in the sense that it is valid if at least one of these models is correctly specified. In addition, TMLE allows for flexible modeling of both the exposure and outcome with machine learning methods. This provides better control for measured confounders since the model specification automatically adapts to the data, instead of needing to be specified by the analyst a priori. Despite these methodological advantages, TMLE remains less popular than alternatives in part because of its less accessible theory and implementation. While some tutorials have been proposed, none address the case of a time-to-event outcome. This tutorial provides a detailed step-by-step explanation of the implementation of TMLE for estimating the effect of a point binary or multilevel exposure on a time-to-event outcome, modeled as counterfactual survival curves and causal hazard ratios. The tutorial also provides guidelines on how best to use TMLE in practice, including aspects related to study design, choice of covariates, controlling biases and use of machine learning. R-code is provided to illustrate each step using simulated data ( https://github.com/detal9/SurvTMLE). To facilitate implementation, a general R function implementing TMLE with options to use machine learning is also provided. The method is illustrated in a real-data analysis concerning the effectiveness of statins for the prevention of a first cardiovascular disease among older adults in Québec, Canada, between 2013 and 2018.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 6","pages":"e70034"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11905698/pdf/","citationCount":"0","resultStr":"{\"title\":\"Guidelines and Best Practices for the Use of Targeted Maximum Likelihood and Machine Learning When Estimating Causal Effects of Exposures on Time-To-Event Outcomes.\",\"authors\":\"Denis Talbot, Awa Diop, Miceline Mésidor, Yohann Chiu, Caroline Sirois, Andrew J Spieker, Antoine Pariente, Pernelle Noize, Marc Simard, Miguel Angel Luque Fernandez, Michael Schomaker, Kenji Fujita, Danijela Gnjidic, Mireille E Schnitzer\",\"doi\":\"10.1002/sim.70034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Targeted maximum likelihood estimation (TMLE) is an increasingly popular framework for the estimation of causal effects. It requires modeling both the exposure and outcome but is doubly robust in the sense that it is valid if at least one of these models is correctly specified. In addition, TMLE allows for flexible modeling of both the exposure and outcome with machine learning methods. This provides better control for measured confounders since the model specification automatically adapts to the data, instead of needing to be specified by the analyst a priori. Despite these methodological advantages, TMLE remains less popular than alternatives in part because of its less accessible theory and implementation. While some tutorials have been proposed, none address the case of a time-to-event outcome. This tutorial provides a detailed step-by-step explanation of the implementation of TMLE for estimating the effect of a point binary or multilevel exposure on a time-to-event outcome, modeled as counterfactual survival curves and causal hazard ratios. The tutorial also provides guidelines on how best to use TMLE in practice, including aspects related to study design, choice of covariates, controlling biases and use of machine learning. R-code is provided to illustrate each step using simulated data ( https://github.com/detal9/SurvTMLE). To facilitate implementation, a general R function implementing TMLE with options to use machine learning is also provided. The method is illustrated in a real-data analysis concerning the effectiveness of statins for the prevention of a first cardiovascular disease among older adults in Québec, Canada, between 2013 and 2018.</p>\",\"PeriodicalId\":21879,\"journal\":{\"name\":\"Statistics in Medicine\",\"volume\":\"44 6\",\"pages\":\"e70034\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11905698/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/sim.70034\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70034","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
目标最大似然估计(TMLE)是一种日益流行的因果效应估计框架。它需要对暴露和结果都进行建模,但具有双重稳健性,即只要其中至少一个模型是正确指定的,它就是有效的。此外,TMLE 还允许使用机器学习方法对暴露和结果进行灵活建模。这样就能更好地控制测量到的混杂因素,因为模型规范会自动适应数据,而不需要分析师事先指定。尽管有这些方法上的优势,TMLE 仍然不如其他方法受欢迎,部分原因是其理论和实施不太容易理解。虽然已经提出了一些教程,但没有一个是针对时间到事件结果的。本教程对 TMLE 的实施进行了详细的分步讲解,以估计二元点暴露或多层次暴露对时间到事件结果的影响,模型为反事实生存曲线和因果危险比。教程还就如何在实践中更好地使用 TMLE 提供了指导,包括与研究设计、协变量的选择、偏差控制和机器学习的使用相关的方面。本教程提供了 R 代码,使用模拟数据(https://github.com/detal9/SurvTMLE)来说明每个步骤。为便于实施,还提供了一个实施 TMLE 的通用 R 函数,其中包含使用机器学习的选项。该方法在一项真实数据分析中进行了说明,该分析涉及他汀类药物在 2013 年至 2018 年间对加拿大魁北克省老年人预防首次心血管疾病的有效性。
Guidelines and Best Practices for the Use of Targeted Maximum Likelihood and Machine Learning When Estimating Causal Effects of Exposures on Time-To-Event Outcomes.
Targeted maximum likelihood estimation (TMLE) is an increasingly popular framework for the estimation of causal effects. It requires modeling both the exposure and outcome but is doubly robust in the sense that it is valid if at least one of these models is correctly specified. In addition, TMLE allows for flexible modeling of both the exposure and outcome with machine learning methods. This provides better control for measured confounders since the model specification automatically adapts to the data, instead of needing to be specified by the analyst a priori. Despite these methodological advantages, TMLE remains less popular than alternatives in part because of its less accessible theory and implementation. While some tutorials have been proposed, none address the case of a time-to-event outcome. This tutorial provides a detailed step-by-step explanation of the implementation of TMLE for estimating the effect of a point binary or multilevel exposure on a time-to-event outcome, modeled as counterfactual survival curves and causal hazard ratios. The tutorial also provides guidelines on how best to use TMLE in practice, including aspects related to study design, choice of covariates, controlling biases and use of machine learning. R-code is provided to illustrate each step using simulated data ( https://github.com/detal9/SurvTMLE). To facilitate implementation, a general R function implementing TMLE with options to use machine learning is also provided. The method is illustrated in a real-data analysis concerning the effectiveness of statins for the prevention of a first cardiovascular disease among older adults in Québec, Canada, between 2013 and 2018.
期刊介绍:
The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.