双重机器学习和自动混淆选择:一个警世故事

IF 1.7 4区 医学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
Paul Hünermund, Beyers Louw, Itamar Caspi
{"title":"双重机器学习和自动混淆选择:一个警世故事","authors":"Paul Hünermund, Beyers Louw, Itamar Caspi","doi":"10.1515/jci-2022-0078","DOIUrl":null,"url":null,"abstract":"Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.","PeriodicalId":48576,"journal":{"name":"Journal of Causal Inference","volume":"2010 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2021-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Double machine learning and automated confounder selection: A cautionary tale\",\"authors\":\"Paul Hünermund, Beyers Louw, Itamar Caspi\",\"doi\":\"10.1515/jci-2022-0078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.\",\"PeriodicalId\":48576,\"journal\":{\"name\":\"Journal of Causal Inference\",\"volume\":\"2010 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2021-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Causal Inference\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1515/jci-2022-0078\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Causal Inference","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1515/jci-2022-0078","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 7

摘要

双机器学习(DML)已经成为一种越来越流行的高维环境中自动变量选择的工具。尽管处理大量潜在协变量的能力可以使可观测选择假设更加合理,但同时也存在内生变量被包括在内的风险,这将导致违反条件独立性。本文证明了DML对协变量空间中仅包含少数“坏控制”非常敏感。由此产生的偏差随理论因果模型的性质而变化,这引起了人们对以数据驱动的方式选择控制变量的可行性的关注。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Double machine learning and automated confounder selection: A cautionary tale
Abstract Double machine learning (DML) has become an increasingly popular tool for automated variable selection in high-dimensional settings. Even though the ability to deal with a large number of potential covariates can render selection-on-observables assumptions more plausible, there is at the same time a growing risk that endogenous variables are included, which would lead to the violation of conditional independence. This article demonstrates that DML is very sensitive to the inclusion of only a few “bad controls” in the covariate space. The resulting bias varies with the nature of the theoretical causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Causal Inference
Journal of Causal Inference Decision Sciences-Statistics, Probability and Uncertainty
CiteScore
1.90
自引率
14.30%
发文量
15
审稿时长
86 weeks
期刊介绍: Journal of Causal Inference (JCI) publishes papers on theoretical and applied causal research across the range of academic disciplines that use quantitative tools to study causality.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信