Data-driven covariate selection for confounding adjustment by focusing on the stability of the effect estimator.

IF 7.6 1区心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY

Psychological methods Pub Date : 2024-10-01 Epub Date: 2023-04-27 DOI:10.1037/met0000564

Wen Wei Loh, Dongning Ren

{"title":"Data-driven covariate selection for confounding adjustment by focusing on the stability of the effect estimator.","authors":"Wen Wei Loh, Dongning Ren","doi":"10.1037/met0000564","DOIUrl":null,"url":null,"abstract":"<p><p>Valid inference of cause-and-effect relations in observational studies necessitates adjusting for common causes of the focal predictor (i.e., treatment) and the outcome. When such common causes, henceforth termed confounders, remain unadjusted for, they generate spurious correlations that lead to biased causal effect estimates. But routine adjustment for all available covariates, when only a subset are truly confounders, is known to yield potentially inefficient and unstable estimators. In this article, we introduce a data-driven confounder selection strategy that focuses on stable estimation of the treatment effect. The approach exploits the causal knowledge that after adjusting for confounders to eliminate all confounding biases, adding any remaining non-confounding covariates associated with only treatment or outcome, but not both, should not systematically change the effect estimator. The strategy proceeds in two steps. First, we prioritize covariates for adjustment by probing how strongly each covariate is associated with treatment and outcome. Next, we gauge the stability of the effect estimator by evaluating its trajectory adjusting for different covariate subsets. The smallest subset that yields a stable effect estimate is then selected. Thus, the strategy offers direct insight into the (in)sensitivity of the effect estimator to the chosen covariates for adjustment. The ability to correctly select confounders and yield valid causal inferences following data-driven covariate selection is evaluated empirically using extensive simulation studies. Furthermore, we compare the introduced method empirically with routine variable selection methods. Finally, we demonstrate the procedure using two publicly available real-world datasets. A step-by-step practical guide with user-friendly R functions is included. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"947-966"},"PeriodicalIF":7.6000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000564","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/4/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Valid inference of cause-and-effect relations in observational studies necessitates adjusting for common causes of the focal predictor (i.e., treatment) and the outcome. When such common causes, henceforth termed confounders, remain unadjusted for, they generate spurious correlations that lead to biased causal effect estimates. But routine adjustment for all available covariates, when only a subset are truly confounders, is known to yield potentially inefficient and unstable estimators. In this article, we introduce a data-driven confounder selection strategy that focuses on stable estimation of the treatment effect. The approach exploits the causal knowledge that after adjusting for confounders to eliminate all confounding biases, adding any remaining non-confounding covariates associated with only treatment or outcome, but not both, should not systematically change the effect estimator. The strategy proceeds in two steps. First, we prioritize covariates for adjustment by probing how strongly each covariate is associated with treatment and outcome. Next, we gauge the stability of the effect estimator by evaluating its trajectory adjusting for different covariate subsets. The smallest subset that yields a stable effect estimate is then selected. Thus, the strategy offers direct insight into the (in)sensitivity of the effect estimator to the chosen covariates for adjustment. The ability to correctly select confounders and yield valid causal inferences following data-driven covariate selection is evaluated empirically using extensive simulation studies. Furthermore, we compare the introduced method empirically with routine variable selection methods. Finally, we demonstrate the procedure using two publicly available real-world datasets. A step-by-step practical guide with user-friendly R functions is included. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

查看原文本刊更多论文

通过关注效应估计器的稳定性，以数据为导向选择协变量进行混杂调整。

要在观察性研究中有效推断因果关系，就必须对重点预测因子（即治疗）和结果的共同原因进行调整。如果不对这些共同原因（以下称为混杂因素）进行调整，就会产生虚假的相关性，导致因果效应估计值出现偏差。但众所周知，当只有一部分是真正的混杂因素时，对所有可用的协变量进行常规调整可能会产生低效和不稳定的估计值。在本文中，我们介绍了一种数据驱动的混杂因素选择策略，其重点是稳定估计治疗效果。该方法利用的因果知识是，在调整混杂因素以消除所有混杂偏差后，添加任何仅与治疗或结果相关而非两者相关的非混杂协变量，都不应系统性地改变效果估计值。该策略分两步进行。首先，我们通过探究每个协变量与治疗和结果的关联程度，确定需要调整的协变量的优先级。接下来，我们通过评估不同协变量子集的调整轨迹来衡量效应估计值的稳定性。然后选择能产生稳定效应估计值的最小子集。因此，该策略可以直接洞察效应估计值对所选协变因素调整的（不）敏感性。通过大量的模拟研究，我们对数据驱动协变量选择后正确选择混杂因素并得出有效因果推论的能力进行了实证评估。此外，我们还将引入的方法与常规变量选择方法进行了实证比较。最后，我们使用两个公开的真实数据集演示了这一过程。此外，我们还提供了使用方便的 R 函数的分步实践指南。(PsycInfo Database Record (c) 2024 APA, all rights reserved)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-

CiteScore

13.10

自引率

7.10%

发文量

159

期刊介绍： Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.