双重机器学习和自动混淆选择:一个警世故事

IF 1.7 4区医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Journal of Causal Inference Pub Date : 2021-08-25 DOI:10.1515/jci-2022-0078

Paul Hünermund, Beyers Louw, Itamar Caspi

{"title":"双重机器学习和自动混淆选择:一个警世故事","authors":"Paul Hünermund, Beyers Louw, Itamar Caspi","doi":"10.1515/jci-2022-0078","DOIUrl":null,"url":null,"abstract":"Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"2010 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2021-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Double machine learning and automated confounder selection: A cautionary tale\",\"authors\":\"Paul Hünermund, Beyers Louw, Itamar Caspi\",\"doi\":\"10.1515/jci-2022-0078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.\",\"PeriodicalId\":48576,\"journal\":{\"name\":\"Journal of Causal Inference\",\"volume\":\"2010 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2021-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Causal Inference\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1515/jci-2022-0078\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Causal Inference","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1515/jci-2022-0078","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 7

摘要

双机器学习(DML)已经成为一种越来越流行的高维环境中自动变量选择的工具。尽管处理大量潜在协变量的能力可以使可观测选择假设更加合理，但同时也存在内生变量被包括在内的风险，这将导致违反条件独立性。本文证明了DML对协变量空间中仅包含少数“坏控制”非常敏感。由此产生的偏差随理论因果模型的性质而变化，这引起了人们对以数据驱动的方式选择控制变量的可行性的关注。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Double machine learning and automated confounder selection: A cautionary tale

Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Causal Inference Decision Sciences-Statistics, Probability and Uncertainty

CiteScore

1.90

自引率

14.30%

发文量

审稿时长

86 weeks

期刊介绍： Journal of Causal Inference (JCI) publishes papers on theoretical and applied causal research across the range of academic disciplines that use quantitative tools to study causality.