Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions

arXiv - ECON - Econometrics Pub Date : 2024-09-02 DOI:arxiv-2409.01266

Jonathan Fuhr, Dominik Papies

{"title":"Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions","authors":"Jonathan Fuhr, Dominik Papies","doi":"arxiv-2409.01266","DOIUrl":null,"url":null,"abstract":"Estimating causal effect using machine learning (ML) algorithms can help to\nrelax functional form assumptions if used within appropriate frameworks.\nHowever, most of these frameworks assume settings with cross-sectional data,\nwhereas researchers often have access to panel data, which in traditional\nmethods helps to deal with unobserved heterogeneity between units. In this\npaper, we explore how we can adapt double/debiased machine learning (DML)\n(Chernozhukov et al., 2018) for panel data in the presence of unobserved\nheterogeneity. This adaptation is challenging because DML's cross-fitting\nprocedure assumes independent data and the unobserved heterogeneity is not\nnecessarily additively separable in settings with nonlinear observed\nconfounding. We assess the performance of several intuitively appealing\nestimators in a variety of simulations. While we find violations of the\ncross-fitting assumptions to be largely inconsequential for the accuracy of the\neffect estimates, many of the considered methods fail to adequately account for\nthe presence of unobserved heterogeneity. However, we find that using\npredictive models based on the correlated random effects approach (Mundlak,\n1978) within DML leads to accurate coefficient estimates across settings, given\na sample size that is large relative to the number of observed confounders. We\nalso show that the influence of the unobserved heterogeneity on the observed\nconfounders plays a significant role for the performance of most alternative\nmethods.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"1583 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Estimating causal effect using machine learning (ML) algorithms can help to relax functional form assumptions if used within appropriate frameworks. However, most of these frameworks assume settings with cross-sectional data, whereas researchers often have access to panel data, which in traditional methods helps to deal with unobserved heterogeneity between units. In this paper, we explore how we can adapt double/debiased machine learning (DML) (Chernozhukov et al., 2018) for panel data in the presence of unobserved heterogeneity. This adaptation is challenging because DML's cross-fitting procedure assumes independent data and the unobserved heterogeneity is not necessarily additively separable in settings with nonlinear observed confounding. We assess the performance of several intuitively appealing estimators in a variety of simulations. While we find violations of the cross-fitting assumptions to be largely inconsequential for the accuracy of the effect estimates, many of the considered methods fail to adequately account for the presence of unobserved heterogeneity. However, we find that using predictive models based on the correlated random effects approach (Mundlak, 1978) within DML leads to accurate coefficient estimates across settings, given a sample size that is large relative to the number of observed confounders. We also show that the influence of the unobserved heterogeneity on the observed confounders plays a significant role for the performance of most alternative methods.

查看原文本刊更多论文

双机器学习与面板数据 -- 前景、陷阱和潜在解决方案

使用机器学习（ML）算法估计因果效应，如果在适当的框架内使用，可以帮助放松函数形式假设。然而，这些框架大多假设有横截面数据，而研究人员通常可以获得面板数据，在传统方法中，面板数据有助于处理单位间的未观察异质性。在本文中，我们将探讨如何在存在未观察异质性的情况下，针对面板数据调整双重/偏差机器学习（DML）（Chernozhukov 等人，2018 年）。这种调整具有挑战性，因为 DML 的交叉拟合过程假定数据是独立的，而在非线性观察混杂的情况下，未观察到的异质性并不一定是可加可分的。我们在各种模拟中评估了几种直观吸引人的估计方法的性能。虽然我们发现违反交叉拟合假设对效应估计的准确性基本没有影响，但许多考虑过的方法未能充分考虑到非观测异质性的存在。然而，我们发现，在 DML 中使用基于相关随机效应方法（Mundlak，1978 年）的预测模型，在样本量相对于观察到的混杂因素数量较大的情况下，可以在各种情况下获得准确的系数估计。我们还表明，未观察到的异质性对观察到的混杂因素的影响对大多数替代方法的性能起着重要作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - ECON - Econometrics

自引率

0.00%

发文量