The Causal Roadmap and Simulations to Improve the Rigor and Reproducibility of Real-data Applications.

IF 4.7 2区医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Epidemiology Pub Date : 2024-11-01 Epub Date: 2024-08-01 DOI:10.1097/EDE.0000000000001773

Nerissa Nance, Maya L Petersen, Mark van der Laan, Laura B Balzer

{"title":"The Causal Roadmap and Simulations to Improve the Rigor and Reproducibility of Real-data Applications.","authors":"Nerissa Nance, Maya L Petersen, Mark van der Laan, Laura B Balzer","doi":"10.1097/EDE.0000000000001773","DOIUrl":null,"url":null,"abstract":"<p><p>The Causal Roadmap outlines a systematic approach to asking and answering questions of cause and effect: define the quantity of interest, evaluate needed assumptions, conduct statistical estimation, and carefully interpret results. To protect research integrity, it is essential that the algorithm for statistical estimation and inference be prespecified prior to conducting any effectiveness analyses. However, it is often unclear which algorithm will perform optimally for the real-data application. Instead, there is a temptation to simply implement one's favorite algorithm, recycling prior code or relying on the default settings of a computing package. Here, we call for the use of simulations that realistically reflect the application, including key characteristics such as strong confounding and dependent or missing outcomes, to objectively compare candidate estimators and facilitate full specification of the statistical analysis plan. Such simulations are informed by the Causal Roadmap and conducted after data collection but prior to effect estimation. We illustrate with two worked examples. First, in an observational longitudinal study, we use outcome-blind simulations to inform nuisance parameter estimation and variance estimation for longitudinal targeted minimum loss-based estimation. Second, in a cluster randomized trial with missing outcomes, we use treatment-blind simulations to examine type-I error control in two-stage targeted minimum loss-based estimation. In both examples, realistic simulations empower us to prespecify an estimation approach with strong expected finite sample performance, and also produce quality-controlled computing code for the actual analysis. Together, this process helps to improve the rigor and reproducibility of our research.</p>","PeriodicalId":11779,"journal":{"name":"Epidemiology","volume":" ","pages":"791-800"},"PeriodicalIF":4.7000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11444352/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/EDE.0000000000001773","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/1 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 0

Abstract

The Causal Roadmap outlines a systematic approach to asking and answering questions of cause and effect: define the quantity of interest, evaluate needed assumptions, conduct statistical estimation, and carefully interpret results. To protect research integrity, it is essential that the algorithm for statistical estimation and inference be prespecified prior to conducting any effectiveness analyses. However, it is often unclear which algorithm will perform optimally for the real-data application. Instead, there is a temptation to simply implement one's favorite algorithm, recycling prior code or relying on the default settings of a computing package. Here, we call for the use of simulations that realistically reflect the application, including key characteristics such as strong confounding and dependent or missing outcomes, to objectively compare candidate estimators and facilitate full specification of the statistical analysis plan. Such simulations are informed by the Causal Roadmap and conducted after data collection but prior to effect estimation. We illustrate with two worked examples. First, in an observational longitudinal study, we use outcome-blind simulations to inform nuisance parameter estimation and variance estimation for longitudinal targeted minimum loss-based estimation. Second, in a cluster randomized trial with missing outcomes, we use treatment-blind simulations to examine type-I error control in two-stage targeted minimum loss-based estimation. In both examples, realistic simulations empower us to prespecify an estimation approach with strong expected finite sample performance, and also produce quality-controlled computing code for the actual analysis. Together, this process helps to improve the rigor and reproducibility of our research.

查看原文本刊更多论文

改善真实数据应用的严谨性和可重复性的因果路线图和模拟。

因果关系路线图概述了提出和回答因果关系问题的系统方法：定义感兴趣的数量、评估所需假设、进行统计估算并仔细解释结果。为了保护研究的完整性，在进行任何有效性分析之前，必须预先确定统计估算和推断的算法。然而，人们往往不清楚哪种算法在实际数据应用中表现最佳。相反，人们往往会简单地执行自己喜欢的算法，重复使用先前的代码或依赖于计算软件包的默认设置。在此，我们呼吁使用能真实反映应用的模拟，包括强混杂、依赖或缺失结果等关键特征，以客观地比较候选估计器，并促进统计分析计划的全面规范化。此类模拟以因果关系路线图为依据，在数据收集之后、效应估计之前进行。我们用两个实例来说明。首先，在一项观察性纵向研究中，我们使用结果盲模拟为基于最小损失的纵向目标估算的滋扰参数估计和方差估计提供信息。其次，在一项结果缺失的群组随机试验中，我们使用治疗盲模拟来检验基于最小损失的两阶段目标估计中的I型误差控制。在这两个例子中，现实模拟使我们有能力预先指定一种估计方法，这种方法预计具有很强的有限样本性能，同时还能为实际分析提供质量可控的计算代码。这一过程有助于提高我们研究的严谨性和可重复性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

6.70

自引率

3.70%

发文量

177

审稿时长

6-12 weeks

期刊介绍： Epidemiology publishes original research from all fields of epidemiology. The journal also welcomes review articles and meta-analyses, novel hypotheses, descriptions and applications of new methods, and discussions of research theory or public health policy. We give special consideration to papers from developing countries.