使用关联数据乘以因果效应的稳健估计

IF 1.6 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis Pub Date : 2025-04-02 DOI:10.1016/j.csda.2025.108175

Shanshan Luo , Yechi Zhang , Wei Li , Zhi Geng

{"title":"使用关联数据乘以因果效应的稳健估计","authors":"Shanshan Luo , Yechi Zhang , Wei Li , Zhi Geng","doi":"10.1016/j.csda.2025.108175","DOIUrl":null,"url":null,"abstract":"<div><div>Unmeasured confounding presents a common challenge in observational studies, potentially making standard causal parameters unidentifiable without additional assumptions. Given the increasing availability of diverse data sources, exploiting data linkage offers a potential solution to mitigate unmeasured confounding within a primary study of interest. However, this approach often introduces selection bias, as data linkage is feasible only for a subset of the study population. To address such a concern, this paper explores three nonparametric identification strategies assuming that a unit's inclusion in the linked cohort is determined solely by the observed confounders, while acknowledging that the ignorability assumption may depend on some partially unobserved covariates. The existence of multiple identification strategies motivates the development of estimators that effectively capture distinct components of the observed data distribution. Appropriately combining these estimators yields triply robust estimators for the average treatment effect. These estimators remain consistent if at least one of the three distinct parts of the observed data law is correct. Moreover, they are locally efficient if all the models are correctly specified. The proposed estimators are evaluated using simulation studies and real data analysis.</div></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"209 ","pages":"Article 108175"},"PeriodicalIF":1.6000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiply robust estimation of causal effects using linked data\",\"authors\":\"Shanshan Luo , Yechi Zhang , Wei Li , Zhi Geng\",\"doi\":\"10.1016/j.csda.2025.108175\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Unmeasured confounding presents a common challenge in observational studies, potentially making standard causal parameters unidentifiable without additional assumptions. Given the increasing availability of diverse data sources, exploiting data linkage offers a potential solution to mitigate unmeasured confounding within a primary study of interest. However, this approach often introduces selection bias, as data linkage is feasible only for a subset of the study population. To address such a concern, this paper explores three nonparametric identification strategies assuming that a unit's inclusion in the linked cohort is determined solely by the observed confounders, while acknowledging that the ignorability assumption may depend on some partially unobserved covariates. The existence of multiple identification strategies motivates the development of estimators that effectively capture distinct components of the observed data distribution. Appropriately combining these estimators yields triply robust estimators for the average treatment effect. These estimators remain consistent if at least one of the three distinct parts of the observed data law is correct. Moreover, they are locally efficient if all the models are correctly specified. The proposed estimators are evaluated using simulation studies and real data analysis.</div></div>\",\"PeriodicalId\":55225,\"journal\":{\"name\":\"Computational Statistics & Data Analysis\",\"volume\":\"209 \",\"pages\":\"Article 108175\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Statistics & Data Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167947325000519\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics & Data Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167947325000519","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在观察性研究中，无法测量的混杂是一个常见的挑战，如果没有额外的假设，可能会使标准的因果参数无法识别。鉴于各种数据源的可用性日益增加，利用数据链接提供了一种潜在的解决方案，以减轻感兴趣的初级研究中不可测量的混淆。然而，这种方法往往会引入选择偏差，因为数据链接只适用于研究人群的一个子集。为了解决这样的问题，本文探讨了三种非参数识别策略，假设一个单位在相关队列中的包含仅由观察到的混杂因素决定，同时承认可忽略性假设可能依赖于一些部分未观察到的协变量。多种识别策略的存在激发了估计器的发展，这些估计器可以有效地捕获观测数据分布的不同组成部分。适当地结合这些估计量可以得到平均处理效果的三个稳健估计量。如果观测到的数据定律的三个不同部分中至少有一个是正确的，那么这些估计量保持一致。此外，如果正确指定了所有模型，则它们是局部有效的。利用仿真研究和实际数据分析对所提出的估计器进行了评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multiply robust estimation of causal effects using linked data

Unmeasured confounding presents a common challenge in observational studies, potentially making standard causal parameters unidentifiable without additional assumptions. Given the increasing availability of diverse data sources, exploiting data linkage offers a potential solution to mitigate unmeasured confounding within a primary study of interest. However, this approach often introduces selection bias, as data linkage is feasible only for a subset of the study population. To address such a concern, this paper explores three nonparametric identification strategies assuming that a unit's inclusion in the linked cohort is determined solely by the observed confounders, while acknowledging that the ignorability assumption may depend on some partially unobserved covariates. The existence of multiple identification strategies motivates the development of estimators that effectively capture distinct components of the observed data distribution. Appropriately combining these estimators yields triply robust estimators for the average treatment effect. These estimators remain consistent if at least one of the three distinct parts of the observed data law is correct. Moreover, they are locally efficient if all the models are correctly specified. The proposed estimators are evaluated using simulation studies and real data analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Statistics & Data Analysis 数学-计算机：跨学科应用

CiteScore

3.70

自引率

5.60%

发文量

167

审稿时长

60 days

期刊介绍： Computational Statistics and Data Analysis (CSDA), an Official Publication of the network Computational and Methodological Statistics (CMStatistics) and of the International Association for Statistical Computing (IASC), is an international journal dedicated to the dissemination of methodological research and applications in the areas of computational statistics and data analysis. The journal consists of four refereed sections which are divided into the following subject areas: I) Computational Statistics - Manuscripts dealing with: 1) the explicit impact of computers on statistical methodology (e.g., Bayesian computing, bioinformatics,computer graphics, computer intensive inferential methods, data exploration, data mining, expert systems, heuristics, knowledge based systems, machine learning, neural networks, numerical and optimization methods, parallel computing, statistical databases, statistical systems), and 2) the development, evaluation and validation of statistical software and algorithms. Software and algorithms can be submitted with manuscripts and will be stored together with the online article. II) Statistical Methodology for Data Analysis - Manuscripts dealing with novel and original data analytical strategies and methodologies applied in biostatistics (design and analytic methods for clinical trials, epidemiological studies, statistical genetics, or genetic/environmental interactions), chemometrics, classification, data exploration, density estimation, design of experiments, environmetrics, education, image analysis, marketing, model free data exploration, pattern recognition, psychometrics, statistical physics, image processing, robust procedures. [...] III) Special Applications - [...] IV) Annals of Statistical Data Science [...]