{"title":"双重机器学习和自动混淆选择:一个警世故事","authors":"Paul Hünermund, Beyers Louw, Itamar Caspi","doi":"10.1515/jci-2022-0078","DOIUrl":null,"url":null,"abstract":"Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"2010 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2021-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Double machine learning and automated confounder selection: A cautionary tale\",\"authors\":\"Paul Hünermund, Beyers Louw, Itamar Caspi\",\"doi\":\"10.1515/jci-2022-0078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.\",\"PeriodicalId\":48576,\"journal\":{\"name\":\"Journal of Causal Inference\",\"volume\":\"2010 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2021-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Causal Inference\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1515/jci-2022-0078\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Causal Inference","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1515/jci-2022-0078","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Double machine learning and automated confounder selection: A cautionary tale
Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.
期刊介绍:
Journal of Causal Inference (JCI) publishes papers on theoretical and applied causal research across the range of academic disciplines that use quantitative tools to study causality.