Addressing sample selection bias for machine learning methods

IF 2.3 3区 经济学 Q2 ECONOMICS
Dylan Brewer, Alyssa Carlson
{"title":"Addressing sample selection bias for machine learning methods","authors":"Dylan Brewer,&nbsp;Alyssa Carlson","doi":"10.1002/jae.3029","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>We study approaches for adjusting machine learning methods when the training sample differs from the prediction sample on unobserved dimensions. The machine learning literature predominately assumes selection only on observed dimensions. Common approaches are to weight or include variables that influence selection as solutions to selection on observables. Simulation results show that selection on unobservables increases mean squared prediction error using popular machine-learning algorithms. Common machine learning practices such as weighting or including variables that influence selection into the training or prediction sample often worsen sample selection bias. We propose two control function approaches that remove the effects of selection bias before training and find that they reduce mean-squared prediction error in simulations. We apply these approaches to predicting the vote share of the incumbent in gubernatorial elections using previously observed re-election bids. We find that ignoring selection on unobservables leads to substantially higher predicted vote shares for the incumbent than when the control function approach is used.</p>\n </div>","PeriodicalId":48363,"journal":{"name":"Journal of Applied Econometrics","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Econometrics","FirstCategoryId":"96","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jae.3029","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

We study approaches for adjusting machine learning methods when the training sample differs from the prediction sample on unobserved dimensions. The machine learning literature predominately assumes selection only on observed dimensions. Common approaches are to weight or include variables that influence selection as solutions to selection on observables. Simulation results show that selection on unobservables increases mean squared prediction error using popular machine-learning algorithms. Common machine learning practices such as weighting or including variables that influence selection into the training or prediction sample often worsen sample selection bias. We propose two control function approaches that remove the effects of selection bias before training and find that they reduce mean-squared prediction error in simulations. We apply these approaches to predicting the vote share of the incumbent in gubernatorial elections using previously observed re-election bids. We find that ignoring selection on unobservables leads to substantially higher predicted vote shares for the incumbent than when the control function approach is used.

解决机器学习方法的样本选择偏差问题
我们研究了当训练样本与预测样本在非观察维度上存在差异时调整机器学习方法的方法。机器学习文献主要假定只在观察维度上进行选择。常见的方法是对影响选择的变量进行加权或将其包含在内,以此作为在可观测维度上进行选择的解决方案。模拟结果表明,使用流行的机器学习算法,对非观测维度的选择会增加均方预测误差。常见的机器学习做法,如将影响选择的变量加权或纳入训练或预测样本,往往会加剧样本选择偏差。我们提出了两种控制函数方法,可在训练前消除选择偏差的影响,并在模拟中发现它们可降低均方预测误差。我们将这些方法用于预测州长选举中现任者的得票率,使用的是之前观察到的连任竞标。我们发现,与使用控制函数方法相比,忽略对非观测变量的选择会使现任者的预测得票率大幅提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.70
自引率
4.80%
发文量
63
期刊介绍: The Journal of Applied Econometrics is an international journal published bi-monthly, plus 1 additional issue (total 7 issues). It aims to publish articles of high quality dealing with the application of existing as well as new econometric techniques to a wide variety of problems in economics and related subjects, covering topics in measurement, estimation, testing, forecasting, and policy analysis. The emphasis is on the careful and rigorous application of econometric techniques and the appropriate interpretation of the results. The economic content of the articles is stressed. A special feature of the Journal is its emphasis on the replicability of results by other researchers. To achieve this aim, authors are expected to make available a complete set of the data used as well as any specialised computer programs employed through a readily accessible medium, preferably in a machine-readable form. The use of microcomputers in applied research and transferability of data is emphasised. The Journal also features occasional sections of short papers re-evaluating previously published papers. The intention of the Journal of Applied Econometrics is to provide an outlet for innovative, quantitative research in economics which cuts across areas of specialisation, involves transferable techniques, and is easily replicable by other researchers. Contributions that introduce statistical methods that are applicable to a variety of economic problems are actively encouraged. The Journal also aims to publish review and survey articles that make recent developments in the field of theoretical and applied econometrics more readily accessible to applied economists in general.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信