{"title":"双机器学习与面板数据 -- 前景、陷阱和潜在解决方案","authors":"Jonathan Fuhr, Dominik Papies","doi":"arxiv-2409.01266","DOIUrl":null,"url":null,"abstract":"Estimating causal effect using machine learning (ML) algorithms can help to\nrelax functional form assumptions if used within appropriate frameworks.\nHowever, most of these frameworks assume settings with cross-sectional data,\nwhereas researchers often have access to panel data, which in traditional\nmethods helps to deal with unobserved heterogeneity between units. In this\npaper, we explore how we can adapt double/debiased machine learning (DML)\n(Chernozhukov et al., 2018) for panel data in the presence of unobserved\nheterogeneity. This adaptation is challenging because DML's cross-fitting\nprocedure assumes independent data and the unobserved heterogeneity is not\nnecessarily additively separable in settings with nonlinear observed\nconfounding. We assess the performance of several intuitively appealing\nestimators in a variety of simulations. While we find violations of the\ncross-fitting assumptions to be largely inconsequential for the accuracy of the\neffect estimates, many of the considered methods fail to adequately account for\nthe presence of unobserved heterogeneity. However, we find that using\npredictive models based on the correlated random effects approach (Mundlak,\n1978) within DML leads to accurate coefficient estimates across settings, given\na sample size that is large relative to the number of observed confounders. We\nalso show that the influence of the unobserved heterogeneity on the observed\nconfounders plays a significant role for the performance of most alternative\nmethods.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"1583 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions\",\"authors\":\"Jonathan Fuhr, Dominik Papies\",\"doi\":\"arxiv-2409.01266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating causal effect using machine learning (ML) algorithms can help to\\nrelax functional form assumptions if used within appropriate frameworks.\\nHowever, most of these frameworks assume settings with cross-sectional data,\\nwhereas researchers often have access to panel data, which in traditional\\nmethods helps to deal with unobserved heterogeneity between units. In this\\npaper, we explore how we can adapt double/debiased machine learning (DML)\\n(Chernozhukov et al., 2018) for panel data in the presence of unobserved\\nheterogeneity. This adaptation is challenging because DML's cross-fitting\\nprocedure assumes independent data and the unobserved heterogeneity is not\\nnecessarily additively separable in settings with nonlinear observed\\nconfounding. We assess the performance of several intuitively appealing\\nestimators in a variety of simulations. While we find violations of the\\ncross-fitting assumptions to be largely inconsequential for the accuracy of the\\neffect estimates, many of the considered methods fail to adequately account for\\nthe presence of unobserved heterogeneity. However, we find that using\\npredictive models based on the correlated random effects approach (Mundlak,\\n1978) within DML leads to accurate coefficient estimates across settings, given\\na sample size that is large relative to the number of observed confounders. We\\nalso show that the influence of the unobserved heterogeneity on the observed\\nconfounders plays a significant role for the performance of most alternative\\nmethods.\",\"PeriodicalId\":501293,\"journal\":{\"name\":\"arXiv - ECON - Econometrics\",\"volume\":\"1583 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - ECON - Econometrics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.01266\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions
Estimating causal effect using machine learning (ML) algorithms can help to
relax functional form assumptions if used within appropriate frameworks.
However, most of these frameworks assume settings with cross-sectional data,
whereas researchers often have access to panel data, which in traditional
methods helps to deal with unobserved heterogeneity between units. In this
paper, we explore how we can adapt double/debiased machine learning (DML)
(Chernozhukov et al., 2018) for panel data in the presence of unobserved
heterogeneity. This adaptation is challenging because DML's cross-fitting
procedure assumes independent data and the unobserved heterogeneity is not
necessarily additively separable in settings with nonlinear observed
confounding. We assess the performance of several intuitively appealing
estimators in a variety of simulations. While we find violations of the
cross-fitting assumptions to be largely inconsequential for the accuracy of the
effect estimates, many of the considered methods fail to adequately account for
the presence of unobserved heterogeneity. However, we find that using
predictive models based on the correlated random effects approach (Mundlak,
1978) within DML leads to accurate coefficient estimates across settings, given
a sample size that is large relative to the number of observed confounders. We
also show that the influence of the unobserved heterogeneity on the observed
confounders plays a significant role for the performance of most alternative
methods.