Predictive analytics of insurance claims using multivariate decision trees

IF 0.8 Q4 STATISTICS & PROBABILITY

Dependence Modeling Pub Date : 2018-07-18 DOI:10.1515/demo-2018-0022

Zhiyu Quan, Emiliano A. Valdez

{"title":"Predictive analytics of insurance claims using multivariate decision trees","authors":"Zhiyu Quan, Emiliano A. Valdez","doi":"10.1515/demo-2018-0022","DOIUrl":null,"url":null,"abstract":"Abstract Because of its many advantages, the use of decision trees has become an increasingly popular alternative predictive tool for building classification and regression models. Its origins date back for about five decades where the algorithm can be broadly described by repeatedly partitioning the regions of the explanatory variables and thereby creating a tree-based model for predicting the response. Innovations to the original methods, such as random forests and gradient boosting, have further improved the capabilities of using decision trees as a predictive model. In addition, the extension of using decision trees with multivariate response variables started to develop and it is the purpose of this paper to apply multivariate tree models to insurance claims data with correlated responses. This extension to multivariate response variables inherits several advantages of the univariate decision tree models such as distribution-free feature, ability to rank essential explanatory variables, and high predictive accuracy, to name a few. To illustrate the approach, we analyze a dataset drawn from the Wisconsin Local Government Property Insurance Fund (LGPIF)which offers multi-line insurance coverage of property, motor vehicle, and contractors’ equipments.With multivariate tree models, we are able to capture the inherent relationship among the response variables and we find that the marginal predictive model based on multivariate trees is an improvement in prediction accuracy from that based on simply the univariate trees.","PeriodicalId":43690,"journal":{"name":"Dependence Modeling","volume":"6 1","pages":"377 - 407"},"PeriodicalIF":0.8000,"publicationDate":"2018-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/demo-2018-0022","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dependence Modeling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/demo-2018-0022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 29

Abstract

Abstract Because of its many advantages, the use of decision trees has become an increasingly popular alternative predictive tool for building classification and regression models. Its origins date back for about five decades where the algorithm can be broadly described by repeatedly partitioning the regions of the explanatory variables and thereby creating a tree-based model for predicting the response. Innovations to the original methods, such as random forests and gradient boosting, have further improved the capabilities of using decision trees as a predictive model. In addition, the extension of using decision trees with multivariate response variables started to develop and it is the purpose of this paper to apply multivariate tree models to insurance claims data with correlated responses. This extension to multivariate response variables inherits several advantages of the univariate decision tree models such as distribution-free feature, ability to rank essential explanatory variables, and high predictive accuracy, to name a few. To illustrate the approach, we analyze a dataset drawn from the Wisconsin Local Government Property Insurance Fund (LGPIF)which offers multi-line insurance coverage of property, motor vehicle, and contractors’ equipments.With multivariate tree models, we are able to capture the inherent relationship among the response variables and we find that the marginal predictive model based on multivariate trees is an improvement in prediction accuracy from that based on simply the univariate trees.

查看原文本刊更多论文

使用多元决策树的保险索赔预测分析

摘要由于决策树的许多优点，它已成为一种越来越流行的用于建立分类和回归模型的替代预测工具。它的起源可以追溯到大约50年前，在那里，该算法可以通过重复划分解释变量的区域来进行广泛描述，从而创建一个基于树的模型来预测响应。对原始方法的创新，如随机森林和梯度增强，进一步提高了使用决策树作为预测模型的能力。此外，使用具有多变量响应变量的决策树的扩展开始发展，本文的目的是将多变量树模型应用于具有相关响应的保险索赔数据。这种对多变量响应变量的扩展继承了单变量决策树模型的几个优点，如无分布特征、对重要解释变量进行排序的能力和高预测精度等。为了说明这种方法，我们分析了威斯康星州地方政府财产保险基金（LGPIF）的数据集，该基金提供财产、机动车辆和承包商设备的多线保险。使用多变量树模型，我们能够捕捉响应变量之间的内在关系，并且我们发现基于多变量树的边际预测模型比简单基于单变量树的预测模型在预测精度上有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Dependence Modeling STATISTICS & PROBABILITY-

CiteScore

1.00

自引率

0.00%

发文量

审稿时长

12 weeks

期刊介绍： The journal Dependence Modeling aims at providing a medium for exchanging results and ideas in the area of multivariate dependence modeling. It is an open access fully peer-reviewed journal providing the readers with free, instant, and permanent access to all content worldwide. Dependence Modeling is listed by Web of Science (Emerging Sources Citation Index), Scopus, MathSciNet and Zentralblatt Math. The journal presents different types of articles: -"Research Articles" on fundamental theoretical aspects, as well as on significant applications in science, engineering, economics, finance, insurance and other fields. -"Review Articles" which present the existing literature on the specific topic from new perspectives. -"Interview articles" limited to two papers per year, covering interviews with milestone personalities in the field of Dependence Modeling. The journal topics include (but are not limited to):　 -Copula methods -Multivariate distributions -Estimation and goodness-of-fit tests -Measures of association -Quantitative risk management -Risk measures and stochastic orders -Time series -Environmental sciences -Computational methods and software -Extreme-value theory -Limit laws -Mass Transportations