{"title":"使用Shapley值和非参数vine Copula解释预测模型","authors":"K. Aas, T. Nagler, Martin Jullum, A. Løland","doi":"10.1515/demo-2021-0103","DOIUrl":null,"url":null,"abstract":"Abstract In this paper the goal is to explain predictions from complex machine learning models. One method that has become very popular during the last few years is Shapley values. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the previously proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than their competitors.","PeriodicalId":43690,"journal":{"name":"Dependence Modeling","volume":"9 1","pages":"62 - 81"},"PeriodicalIF":0.6000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/demo-2021-0103","citationCount":"15","resultStr":"{\"title\":\"Explaining predictive models using Shapley values and non-parametric vine copulas\",\"authors\":\"K. Aas, T. Nagler, Martin Jullum, A. Løland\",\"doi\":\"10.1515/demo-2021-0103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In this paper the goal is to explain predictions from complex machine learning models. One method that has become very popular during the last few years is Shapley values. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the previously proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than their competitors.\",\"PeriodicalId\":43690,\"journal\":{\"name\":\"Dependence Modeling\",\"volume\":\"9 1\",\"pages\":\"62 - 81\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/demo-2021-0103\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dependence Modeling\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/demo-2021-0103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dependence Modeling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/demo-2021-0103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Explaining predictive models using Shapley values and non-parametric vine copulas
Abstract In this paper the goal is to explain predictions from complex machine learning models. One method that has become very popular during the last few years is Shapley values. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the previously proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than their competitors.
期刊介绍:
The journal Dependence Modeling aims at providing a medium for exchanging results and ideas in the area of multivariate dependence modeling. It is an open access fully peer-reviewed journal providing the readers with free, instant, and permanent access to all content worldwide. Dependence Modeling is listed by Web of Science (Emerging Sources Citation Index), Scopus, MathSciNet and Zentralblatt Math. The journal presents different types of articles: -"Research Articles" on fundamental theoretical aspects, as well as on significant applications in science, engineering, economics, finance, insurance and other fields. -"Review Articles" which present the existing literature on the specific topic from new perspectives. -"Interview articles" limited to two papers per year, covering interviews with milestone personalities in the field of Dependence Modeling. The journal topics include (but are not limited to): -Copula methods -Multivariate distributions -Estimation and goodness-of-fit tests -Measures of association -Quantitative risk management -Risk measures and stochastic orders -Time series -Environmental sciences -Computational methods and software -Extreme-value theory -Limit laws -Mass Transportations