Estimating Average Treatment Effects With Propensity Scores Estimated With Four Machine Learning Procedures: Simulation Results in High Dimensional Settings and With Time to Event Outcomes

ERN: Simulation Methods (Topic) Pub Date : 2018-09-21 DOI:10.2139/ssrn.3272396

Kip Brown, P. Merrigan, Jimmy Royer

{"title":"Estimating Average Treatment Effects With Propensity Scores Estimated With Four Machine Learning Procedures: Simulation Results in High Dimensional Settings and With Time to Event Outcomes","authors":"Kip Brown, P. Merrigan, Jimmy Royer","doi":"10.2139/ssrn.3272396","DOIUrl":null,"url":null,"abstract":"Background: The increased availability of claims data allows one to build high dimensional datasets, rich in covariates, for accurately estimating treatment effects in medical and epidemiological cohort studies. This paper shows the full potential of machine learning for the estimation of average treatment effects with propensity score methods in a context rich and high dimensional datasets. \nMethods: Four different methods are used to estimate average treatment effects in the context of time to event outcomes. The four methods explored in this study are LASSO, Random Forest, Gradient Descent Boosting and Artificial Neural networks. Simulations based on an actual medical claims data set are used to assess the efficiency of these methods. The simulations are performed with over 100, 000 observations and 1,100 explanatory variables. Each method is tested on 500 datasets that are created from the original dataset, allowing us to report the mean and standard deviation of estimated average treatment effects. \nResults: The results are very promising for all four methods; however, LASSO, Random Forest and Gradient Boosting seem to be performing better than Random Forest. \nConclusion: Machine Learning methods can be helpful for observational studies that use the propensity score when a very large number of covariates are available, the total number of observations is large, and the dependent event rare. This is an important result given the availability of big data related to Health Economics and Outcomes Research (HEOR) around the world.","PeriodicalId":364869,"journal":{"name":"ERN: Simulation Methods (Topic)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Simulation Methods (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3272396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Background: The increased availability of claims data allows one to build high dimensional datasets, rich in covariates, for accurately estimating treatment effects in medical and epidemiological cohort studies. This paper shows the full potential of machine learning for the estimation of average treatment effects with propensity score methods in a context rich and high dimensional datasets. Methods: Four different methods are used to estimate average treatment effects in the context of time to event outcomes. The four methods explored in this study are LASSO, Random Forest, Gradient Descent Boosting and Artificial Neural networks. Simulations based on an actual medical claims data set are used to assess the efficiency of these methods. The simulations are performed with over 100, 000 observations and 1,100 explanatory variables. Each method is tested on 500 datasets that are created from the original dataset, allowing us to report the mean and standard deviation of estimated average treatment effects. Results: The results are very promising for all four methods; however, LASSO, Random Forest and Gradient Boosting seem to be performing better than Random Forest. Conclusion: Machine Learning methods can be helpful for observational studies that use the propensity score when a very large number of covariates are available, the total number of observations is large, and the dependent event rare. This is an important result given the availability of big data related to Health Economics and Outcomes Research (HEOR) around the world.

查看原文本刊更多论文

用四种机器学习程序估计的倾向分数估计平均治疗效果:高维设置和事件结果时间的模拟结果

背景:索赔数据的增加使人们能够建立高维数据集，丰富的协变量，以准确估计医学和流行病学队列研究中的治疗效果。本文展示了机器学习在上下文丰富和高维数据集中使用倾向评分方法估计平均治疗效果的全部潜力。方法:使用四种不同的方法来估计在时间到事件结果的背景下的平均治疗效果。本研究探索了LASSO、随机森林、梯度下降增强和人工神经网络四种方法。基于实际医疗索赔数据集的模拟用于评估这些方法的效率。模拟使用了超过100,000个观测值和1,100个解释变量。每种方法都在从原始数据集创建的500个数据集上进行测试，使我们能够报告估计的平均治疗效果的平均值和标准偏差。结果:四种方法的结果都很有希望;然而，套索、随机森林和梯度增强似乎比随机森林表现得更好。结论:当可用的协变量数量非常大，观察总数很大，依赖事件很少时，机器学习方法可以帮助使用倾向评分的观察性研究。鉴于世界各地与卫生经济学和结果研究(HEOR)相关的大数据的可用性，这是一个重要的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ERN: Simulation Methods (Topic)

自引率

0.00%

发文量