Finite Sample Evaluation of Causal Machine Learning Methods: Guidelines for the Applied Researcher

A. Naghi
{"title":"Finite Sample Evaluation of Causal Machine Learning Methods: Guidelines for the Applied Researcher","authors":"A. Naghi","doi":"10.2139/ssrn.3942461","DOIUrl":null,"url":null,"abstract":"The econometrics literature proposed several new causal machine learning methods (CML) in the past few years. These methods harness the strength of machine learning methods to flexibly model the relationship between the treatment, outcome and confounders, while providing valid inferential statements. Whereas numerous options are available now to the applied economics researcher, there is limited guidance on the most useful methodology for a particular applied setting. In this paper, we perform a comprehensive evaluation of the finite sample performance of recently introduced CML methods from the econometrics literature, under a wide range of data generating processes. We focus our analysis on data features that are relevant for causal inference such as varying degrees of: nonlinearity in the outcome and treatment equations, overlap, percentage of treated, alignment and heterogeneity in the treatment effect. We evaluate the methods that have received the most attention so far from the empirical economics literature: double machine learning, causal forest and the generic machine learning methods. Results on the bias, root mean squared error, coverage rates and interval lengths for the average treatment effect, group average treatment effects and individual treatment effects reveal information on the characteristics of the methods and the data features that affect their performance the most.","PeriodicalId":139983,"journal":{"name":"Econometrics: Econometric & Statistical Methods - Special Topics eJournal","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Econometrics: Econometric & Statistical Methods - Special Topics eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3942461","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The econometrics literature proposed several new causal machine learning methods (CML) in the past few years. These methods harness the strength of machine learning methods to flexibly model the relationship between the treatment, outcome and confounders, while providing valid inferential statements. Whereas numerous options are available now to the applied economics researcher, there is limited guidance on the most useful methodology for a particular applied setting. In this paper, we perform a comprehensive evaluation of the finite sample performance of recently introduced CML methods from the econometrics literature, under a wide range of data generating processes. We focus our analysis on data features that are relevant for causal inference such as varying degrees of: nonlinearity in the outcome and treatment equations, overlap, percentage of treated, alignment and heterogeneity in the treatment effect. We evaluate the methods that have received the most attention so far from the empirical economics literature: double machine learning, causal forest and the generic machine learning methods. Results on the bias, root mean squared error, coverage rates and interval lengths for the average treatment effect, group average treatment effects and individual treatment effects reveal information on the characteristics of the methods and the data features that affect their performance the most.
因果机器学习方法的有限样本评估:应用研究指南
近年来计量经济学文献提出了几种新的因果机器学习方法(CML)。这些方法利用机器学习方法的优势,灵活地建模治疗、结果和混杂因素之间的关系,同时提供有效的推理陈述。尽管应用经济学研究者现在有许多选择,但对于一个特定的应用环境,关于最有用的方法的指导是有限的。在本文中,我们在广泛的数据生成过程下,对计量经济学文献中最近引入的CML方法的有限样本性能进行了全面评估。我们将分析重点放在与因果推理相关的数据特征上,例如不同程度的:结果和治疗方程的非线性、重叠、治疗百分比、治疗效果的一致性和异质性。我们评估了迄今为止从实证经济学文献中获得最多关注的方法:双重机器学习,因果森林和通用机器学习方法。关于平均处理效果、群体平均处理效果和个体处理效果的偏倚、均方根误差、覆盖率和间隔长度的结果揭示了方法的特征以及对其性能影响最大的数据特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信