Guidelines and Best Practices for the Use of Targeted Maximum Likelihood and Machine Learning When Estimating Causal Effects of Exposures on Time-To-Event Outcomes.
IF 1.8 4区 医学Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Denis Talbot, Awa Diop, Miceline Mésidor, Yohann Chiu, Caroline Sirois, Andrew J Spieker, Antoine Pariente, Pernelle Noize, Marc Simard, Miguel Angel Luque Fernandez, Michael Schomaker, Kenji Fujita, Danijela Gnjidic, Mireille E Schnitzer
{"title":"Guidelines and Best Practices for the Use of Targeted Maximum Likelihood and Machine Learning When Estimating Causal Effects of Exposures on Time-To-Event Outcomes.","authors":"Denis Talbot, Awa Diop, Miceline Mésidor, Yohann Chiu, Caroline Sirois, Andrew J Spieker, Antoine Pariente, Pernelle Noize, Marc Simard, Miguel Angel Luque Fernandez, Michael Schomaker, Kenji Fujita, Danijela Gnjidic, Mireille E Schnitzer","doi":"10.1002/sim.70034","DOIUrl":null,"url":null,"abstract":"<p><p>Targeted maximum likelihood estimation (TMLE) is an increasingly popular framework for the estimation of causal effects. It requires modeling both the exposure and outcome but is doubly robust in the sense that it is valid if at least one of these models is correctly specified. In addition, TMLE allows for flexible modeling of both the exposure and outcome with machine learning methods. This provides better control for measured confounders since the model specification automatically adapts to the data, instead of needing to be specified by the analyst a priori. Despite these methodological advantages, TMLE remains less popular than alternatives in part because of its less accessible theory and implementation. While some tutorials have been proposed, none address the case of a time-to-event outcome. This tutorial provides a detailed step-by-step explanation of the implementation of TMLE for estimating the effect of a point binary or multilevel exposure on a time-to-event outcome, modeled as counterfactual survival curves and causal hazard ratios. The tutorial also provides guidelines on how best to use TMLE in practice, including aspects related to study design, choice of covariates, controlling biases and use of machine learning. R-code is provided to illustrate each step using simulated data ( https://github.com/detal9/SurvTMLE). To facilitate implementation, a general R function implementing TMLE with options to use machine learning is also provided. The method is illustrated in a real-data analysis concerning the effectiveness of statins for the prevention of a first cardiovascular disease among older adults in Québec, Canada, between 2013 and 2018.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 6","pages":"e70034"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11905698/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70034","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Targeted maximum likelihood estimation (TMLE) is an increasingly popular framework for the estimation of causal effects. It requires modeling both the exposure and outcome but is doubly robust in the sense that it is valid if at least one of these models is correctly specified. In addition, TMLE allows for flexible modeling of both the exposure and outcome with machine learning methods. This provides better control for measured confounders since the model specification automatically adapts to the data, instead of needing to be specified by the analyst a priori. Despite these methodological advantages, TMLE remains less popular than alternatives in part because of its less accessible theory and implementation. While some tutorials have been proposed, none address the case of a time-to-event outcome. This tutorial provides a detailed step-by-step explanation of the implementation of TMLE for estimating the effect of a point binary or multilevel exposure on a time-to-event outcome, modeled as counterfactual survival curves and causal hazard ratios. The tutorial also provides guidelines on how best to use TMLE in practice, including aspects related to study design, choice of covariates, controlling biases and use of machine learning. R-code is provided to illustrate each step using simulated data ( https://github.com/detal9/SurvTMLE). To facilitate implementation, a general R function implementing TMLE with options to use machine learning is also provided. The method is illustrated in a real-data analysis concerning the effectiveness of statins for the prevention of a first cardiovascular disease among older adults in Québec, Canada, between 2013 and 2018.
期刊介绍:
The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.