双机器学习的样本选择模型+

IF 2.5 2区数学 Q1 ECONOMICS

Journal of Business & Economic Statistics Pub Date : 2023-10-16 DOI:10.1080/07350015.2023.2271071

Michela Bia, Martin Huber, Lukáš Lafférs

{"title":"双机器学习的样本选择模型+","authors":"Michela Bia, Martin Huber, Lukáš Lafférs","doi":"10.1080/07350015.2023.2271071","DOIUrl":null,"url":null,"abstract":"AbstractThis paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning- based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data. The estimator is available in the causalweight package for the statistical software R.Keywords: sample selectiondouble machine learningdoubly robust estimationefficient scoreDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":50247,"journal":{"name":"Journal of Business & Economic Statistics","volume":"9 1","pages":"0"},"PeriodicalIF":2.5000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Double machine learning for sample selection models+\",\"authors\":\"Michela Bia, Martin Huber, Lukáš Lafférs\",\"doi\":\"10.1080/07350015.2023.2271071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AbstractThis paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning- based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data. The estimator is available in the causalweight package for the statistical software R.Keywords: sample selectiondouble machine learningdoubly robust estimationefficient scoreDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.\",\"PeriodicalId\":50247,\"journal\":{\"name\":\"Journal of Business & Economic Statistics\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Business & Economic Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/07350015.2023.2271071\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business & Economic Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/07350015.2023.2271071","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 5

摘要

摘要本文考虑了当由于样本选择或结果损耗而只能在一个亚群中观察到结果时，对离散分布处理的评估。为了识别，我们将治疗分配的可观察选择假设与关于结果损耗/样本选择过程的可观察选择假设或工具变量假设结合起来。我们还考虑了动态混淆，这意味着共同影响样本选择和结果的协变量可能(至少部分)受到治疗的影响。为了以数据驱动的方式控制治疗前和/或治疗后协变量的潜在高维集，我们将双机器学习框架用于治疗评估以解决样本选择问题。我们利用(a)内曼正交、双鲁棒性和有效的评分函数，这意味着在基于机器学习的结果、治疗或样本选择模型的估计中，治疗效果估计对中度正则化偏差的鲁棒性;(b)样本分裂(或交叉拟合)以防止过拟合偏差。我们在模拟研究中证明了所提出的估计量是渐近正态和根n一致的，并研究了它们的有限样本性质。我们还将我们提出的方法应用于就业团的数据。该估计器在统计软件r的因果权重包中可用。关键词:样本选择，双重机器学习，双重鲁棒估计，有效分数免责声明作为对作者和研究人员的服务，我们提供此版本的已接受手稿(AM)。在最终出版版本记录(VoR)之前，将对该手稿进行编辑、排版和审查。在制作和印前，可能会发现可能影响内容的错误，所有适用于期刊的法律免责声明也与这些版本有关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Double machine learning for sample selection models+

AbstractThis paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption for treatment assignment with either selection-on-observables or instrumental variable assumptions concerning the outcome attrition/sample selection process. We also consider dynamic confounding, meaning that covariates that jointly affect sample selection and the outcome may (at least partly) be influenced by the treatment. To control in a data-driven way for a potentially high dimensional set of pre- and/or post-treatment covariates, we adapt the double machine learning framework for treatment evaluation to sample selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and efficient score functions, which imply the robustness of treatment effect estimation to moderate regularization biases in the machine learning- based estimation of the outcome, treatment, or sample selection models and (b) sample splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that the proposed estimators are asymptotically normal and root-n consistent and investigate their finite sample properties in a simulation study. We also apply our proposed methodology to the Job Corps data. The estimator is available in the causalweight package for the statistical software R.Keywords: sample selectiondouble machine learningdoubly robust estimationefficient scoreDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Business & Economic Statistics 数学-统计学与概率论

CiteScore

5.00

自引率

6.70%

发文量

审稿时长

>12 weeks

期刊介绍： The Journal of Business and Economic Statistics (JBES) publishes a range of articles, primarily applied statistical analyses of microeconomic, macroeconomic, forecasting, business, and finance related topics. More general papers in statistics, econometrics, computation, simulation, or graphics are also appropriate if they are immediately applicable to the journal''s general topics of interest. Articles published in JBES contain significant results, high-quality methodological content, excellent exposition, and usually include a substantive empirical application.