Double machine learning for sample selection models+

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
Michela Bia, Martin Huber, Lukáš Lafférs
{"title":"Double machine learning for sample selection models+","authors":"Michela Bia, Martin Huber, Lukáš Lafférs","doi":"10.1080/07350015.2023.2271071","DOIUrl":null,"url":null,"abstract":"AbstractThis paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning- based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data. The estimator is available in the causalweight package for the statistical software R.Keywords: sample selectiondouble machine learningdoubly robust estimationefficient scoreDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/07350015.2023.2271071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 5

Abstract

AbstractThis paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning- based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data. The estimator is available in the causalweight package for the statistical software R.Keywords: sample selectiondouble machine learningdoubly robust estimationefficient scoreDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
双机器学习的样本选择模型+
摘要本文考虑了当由于样本选择或结果损耗而只能在一个亚群中观察到结果时,对离散分布处理的评估。为了识别,我们将治疗分配的可观察选择假设与关于结果损耗/样本选择过程的可观察选择假设或工具变量假设结合起来。我们还考虑了动态混淆,这意味着共同影响样本选择和结果的协变量可能(至少部分)受到治疗的影响。为了以数据驱动的方式控制治疗前和/或治疗后协变量的潜在高维集,我们将双机器学习框架用于治疗评估以解决样本选择问题。我们利用(a)内曼正交、双鲁棒性和有效的评分函数,这意味着在基于机器学习的结果、治疗或样本选择模型的估计中,治疗效果估计对中度正则化偏差的鲁棒性;(b)样本分裂(或交叉拟合)以防止过拟合偏差。我们在模拟研究中证明了所提出的估计量是渐近正态和根n一致的,并研究了它们的有限样本性质。我们还将我们提出的方法应用于就业团的数据。该估计器在统计软件r的因果权重包中可用。关键词:样本选择,双重机器学习,双重鲁棒估计,有效分数免责声明作为对作者和研究人员的服务,我们提供此版本的已接受手稿(AM)。在最终出版版本记录(VoR)之前,将对该手稿进行编辑、排版和审查。在制作和印前,可能会发现可能影响内容的错误,所有适用于期刊的法律免责声明也与这些版本有关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信