Estimating Average Treatment Effects With Propensity Scores Estimated With Four Machine Learning Procedures: Simulation Results in High Dimensional Settings and With Time to Event Outcomes

Kip Brown, P. Merrigan, Jimmy Royer
{"title":"Estimating Average Treatment Effects With Propensity Scores Estimated With Four Machine Learning Procedures: Simulation Results in High Dimensional Settings and With Time to Event Outcomes","authors":"Kip Brown, P. Merrigan, Jimmy Royer","doi":"10.2139/ssrn.3272396","DOIUrl":null,"url":null,"abstract":"Background: The increased availability of claims data allows one to build high dimensional datasets, rich in covariates, for accurately estimating treatment effects in medical and epidemiological cohort studies. This paper shows the full potential of machine learning for the estimation of average treatment effects with propensity score methods in a context rich and high dimensional datasets. \nMethods: Four different methods are used to estimate average treatment effects in the context of time to event outcomes. The four methods explored in this study are LASSO, Random Forest, Gradient Descent Boosting and Artificial Neural networks. Simulations based on an actual medical claims data set are used to assess the efficiency of these methods. The simulations are performed with over 100, 000 observations and 1,100 explanatory variables. Each method is tested on 500 datasets that are created from the original dataset, allowing us to report the mean and standard deviation of estimated average treatment effects. \nResults: The results are very promising for all four methods; however, LASSO, Random Forest and Gradient Boosting seem to be performing better than Random Forest. \nConclusion: Machine Learning methods can be helpful for observational studies that use the propensity score when a very large number of covariates are available, the total number of observations is large, and the dependent event rare. This is an important result given the availability of big data related to Health Economics and Outcomes Research (HEOR) around the world.","PeriodicalId":364869,"journal":{"name":"ERN: Simulation Methods (Topic)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Simulation Methods (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3272396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Background: The increased availability of claims data allows one to build high dimensional datasets, rich in covariates, for accurately estimating treatment effects in medical and epidemiological cohort studies. This paper shows the full potential of machine learning for the estimation of average treatment effects with propensity score methods in a context rich and high dimensional datasets. Methods: Four different methods are used to estimate average treatment effects in the context of time to event outcomes. The four methods explored in this study are LASSO, Random Forest, Gradient Descent Boosting and Artificial Neural networks. Simulations based on an actual medical claims data set are used to assess the efficiency of these methods. The simulations are performed with over 100, 000 observations and 1,100 explanatory variables. Each method is tested on 500 datasets that are created from the original dataset, allowing us to report the mean and standard deviation of estimated average treatment effects. Results: The results are very promising for all four methods; however, LASSO, Random Forest and Gradient Boosting seem to be performing better than Random Forest. Conclusion: Machine Learning methods can be helpful for observational studies that use the propensity score when a very large number of covariates are available, the total number of observations is large, and the dependent event rare. This is an important result given the availability of big data related to Health Economics and Outcomes Research (HEOR) around the world.
用四种机器学习程序估计的倾向分数估计平均治疗效果:高维设置和事件结果时间的模拟结果
背景:索赔数据的增加使人们能够建立高维数据集,丰富的协变量,以准确估计医学和流行病学队列研究中的治疗效果。本文展示了机器学习在上下文丰富和高维数据集中使用倾向评分方法估计平均治疗效果的全部潜力。方法:使用四种不同的方法来估计在时间到事件结果的背景下的平均治疗效果。本研究探索了LASSO、随机森林、梯度下降增强和人工神经网络四种方法。基于实际医疗索赔数据集的模拟用于评估这些方法的效率。模拟使用了超过100,000个观测值和1,100个解释变量。每种方法都在从原始数据集创建的500个数据集上进行测试,使我们能够报告估计的平均治疗效果的平均值和标准偏差。结果:四种方法的结果都很有希望;然而,套索、随机森林和梯度增强似乎比随机森林表现得更好。结论:当可用的协变量数量非常大,观察总数很大,依赖事件很少时,机器学习方法可以帮助使用倾向评分的观察性研究。鉴于世界各地与卫生经济学和结果研究(HEOR)相关的大数据的可用性,这是一个重要的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信