Double/debiased machine learning for treatment and structural parameters

IF 7 4区经济学 Q1 ECONOMICS

Econometrics Journal Pub Date : 2017-06-24 DOI:10.1111/ectj.12097

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, James Robins

{"title":"Double/debiased machine learning for treatment and structural parameters","authors":"Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, James Robins","doi":"10.1111/ectj.12097","DOIUrl":null,"url":null,"abstract":"<div>\n \n We revisit the classic semi-parametric problem of inference on a low-dimensional parameter θ0 in the presence of high-dimensional nuisance parameters η0. We depart from the classical setting by allowing for η0 to be so high-dimensional that the traditional assumptions (e.g. Donsker properties) that limit complexity of the parameter space for this object break down. To estimate η0, we consider the use of statistical or machine learning (ML) methods, which are particularly well suited to estimation in modern, very high-dimensional cases. ML methods perform well by employing regularization to reduce variance and trading off regularization bias with overfitting in practice. However, both regularization bias and overfitting in estimating η0 cause a heavy bias in estimators of θ0 that are obtained by naively plugging ML estimators of η0 into estimating equations for θ0. This bias results in the naive estimator failing to be consistent, where N is the sample size. We show that the impact of regularization bias and overfitting on estimation of the parameter of interest θ0 can be removed by using two simple, yet critical, ingredients: (1) using Neyman-orthogonal moments/scores that have reduced sensitivity with respect to nuisance parameters to estimate θ0; (2) making use of cross-fitting, which provides an efficient form of data-splitting. We call the resulting set of methods double or debiased ML (DML). We verify that DML delivers point estimators that concentrate in an -neighbourhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements. The generic statistical theory of DML is elementary and simultaneously relies on only weak theoretical requirements, which will admit the use of a broad array of modern ML methods for estimating the nuisance parameters, such as random forests, lasso, ridge, deep neural nets, boosted trees, and various hybrids and ensembles of these methods. We illustrate the general theory by applying it to provide theoretical properties of the following: DML applied to learn the main regression parameter in a partially linear regression model; DML applied to learn the coefficient on an endogenous variable in a partially linear instrumental variables model; DML applied to learn the average treatment effect and the average treatment effect on the treated under unconfoundedness; DML applied to learn the local average treatment effect in an instrumental variables setting. In addition to these theoretical applications, we also illustrate the use of DML in three empirical examples.</div>","PeriodicalId":50555,"journal":{"name":"Econometrics Journal","volume":"21 1","pages":"C1-C68"},"PeriodicalIF":7.0000,"publicationDate":"2017-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/ectj.12097","citationCount":"1512","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Econometrics Journal","FirstCategoryId":"96","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ectj.12097","RegionNum":4,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 1512

Abstract

We revisit the classic semi-parametric problem of inference on a low-dimensional parameter θ₀ in the presence of high-dimensional nuisance parameters η₀. We depart from the classical setting by allowing for η₀ to be so high-dimensional that the traditional assumptions (e.g. Donsker properties) that limit complexity of the parameter space for this object break down. To estimate η₀, we consider the use of statistical or machine learning (ML) methods, which are particularly well suited to estimation in modern, very high-dimensional cases. ML methods perform well by employing regularization to reduce variance and trading off regularization bias with overfitting in practice. However, both regularization bias and overfitting in estimating η₀ cause a heavy bias in estimators of θ₀ that are obtained by naively plugging ML estimators of η₀ into estimating equations for θ₀. This bias results in the naive estimator failing to be consistent, where N is the sample size. We show that the impact of regularization bias and overfitting on estimation of the parameter of interest θ₀ can be removed by using two simple, yet critical, ingredients: (1) using Neyman-orthogonal moments/scores that have reduced sensitivity with respect to nuisance parameters to estimate θ₀; (2) making use of cross-fitting, which provides an efficient form of data-splitting. We call the resulting set of methods double or debiased ML (DML). We verify that DML delivers point estimators that concentrate in an -neighbourhood of the true parameter values and are approximately unbiased and normally distributed, which allows construction of valid confidence statements. The generic statistical theory of DML is elementary and simultaneously relies on only weak theoretical requirements, which will admit the use of a broad array of modern ML methods for estimating the nuisance parameters, such as random forests, lasso, ridge, deep neural nets, boosted trees, and various hybrids and ensembles of these methods. We illustrate the general theory by applying it to provide theoretical properties of the following: DML applied to learn the main regression parameter in a partially linear regression model; DML applied to learn the coefficient on an endogenous variable in a partially linear instrumental variables model; DML applied to learn the average treatment effect and the average treatment effect on the treated under unconfoundedness; DML applied to learn the local average treatment effect in an instrumental variables setting. In addition to these theoretical applications, we also illustrate the use of DML in three empirical examples.

查看原文本刊更多论文

用于治疗和结构参数的双/去偏机器学习

我们重新讨论了在存在高维扰动参数η0的情况下对低维参数θ0进行推理的经典半参数问题。我们偏离了经典设置，允许η0是如此高维，以至于限制该对象参数空间复杂性的传统假设（例如Donsker性质）被打破。为了估计η0，我们考虑使用统计或机器学习（ML）方法，这些方法特别适合在现代非常高维的情况下进行估计。ML方法通过使用正则化来减少方差，并在实践中权衡正则化偏差和过拟合，表现良好。然而，估计η0时的正则化偏差和过拟合都会导致θ0估计量的严重偏差，这些估计量是通过将η0的ML估计量天真地插入θ0的估计方程中而获得的。这种偏差导致天真估计器不一致，其中N是样本大小。我们表明，正则化偏差和过拟合对感兴趣参数θ0估计的影响可以通过使用两个简单但关键的成分来消除：（1）使用对干扰参数敏感度降低的Neyman正交矩/分数来估计θ0；（2）利用交叉拟合，这提供了一种有效的数据分割形式。我们将得到的方法集称为双偏或去偏ML（DML）。我们验证了DML提供的点估计集中在真实参数值的邻域内，并且是近似无偏和正态分布的，这允许构建有效的置信度声明。DML的通用统计理论是基本的，同时只依赖于较弱的理论要求，这将允许使用广泛的现代ML方法来估计干扰参数，如随机森林、套索、山脊、深度神经网络、增强树，以及这些方法的各种混合和集合。我们通过应用它来提供以下的理论性质来说明一般理论：DML用于学习部分线性回归模型中的主要回归参数；DML用于学习部分线性工具变量模型中内生变量的系数；DML用于学习平均治疗效果和对未发现的患者的平均治疗效果；DML应用于学习工具变量设置中的局部平均治疗效果。除了这些理论应用之外，我们还在三个经验例子中说明了DML的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Econometrics Journal 管理科学-数学跨学科应用

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

>12 weeks

期刊介绍： The Econometrics Journal was established in 1998 by the Royal Economic Society with the aim of creating a top international field journal for the publication of econometric research with a standard of intellectual rigour and academic standing similar to those of the pre-existing top field journals in econometrics. The Econometrics Journal is committed to publishing first-class papers in macro-, micro- and financial econometrics. It is a general journal for econometric research open to all areas of econometrics, whether applied, computational, methodological or theoretical contributions.