使用Shapley值和非参数vine Copula解释预测模型

IF 0.6 Q4 STATISTICS & PROBABILITY

Dependence Modeling Pub Date : 2021-01-01 DOI:10.1515/demo-2021-0103

K. Aas, T. Nagler, Martin Jullum, A. Løland

{"title":"使用Shapley值和非参数vine Copula解释预测模型","authors":"K. Aas, T. Nagler, Martin Jullum, A. Løland","doi":"10.1515/demo-2021-0103","DOIUrl":null,"url":null,"abstract":"Abstract In this paper the goal is to explain predictions from complex machine learning models. One method that has become very popular during the last few years is Shapley values. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the previously proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than their competitors.","PeriodicalId":43690,"journal":{"name":"Dependence Modeling","volume":"9 1","pages":"62 - 81"},"PeriodicalIF":0.6000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/demo-2021-0103","citationCount":"15","resultStr":"{\"title\":\"Explaining predictive models using Shapley values and non-parametric vine copulas\",\"authors\":\"K. Aas, T. Nagler, Martin Jullum, A. Løland\",\"doi\":\"10.1515/demo-2021-0103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In this paper the goal is to explain predictions from complex machine learning models. One method that has become very popular during the last few years is Shapley values. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the previously proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than their competitors.\",\"PeriodicalId\":43690,\"journal\":{\"name\":\"Dependence Modeling\",\"volume\":\"9 1\",\"pages\":\"62 - 81\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/demo-2021-0103\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dependence Modeling\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/demo-2021-0103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dependence Modeling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/demo-2021-0103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 15

摘要

摘要本文的目的是解释复杂机器学习模型的预测。在过去几年中，一种非常流行的方法是Shapley值。用于预测解释的Shapley值的最初发展依赖于所描述的特征是独立的假设。如果现实中的特征是相互依赖的，这可能会导致错误的解释。因此，最近尝试对特征之间的相关性进行适当建模/估计。尽管之前提出的方法明显优于假设独立性的传统方法，但它们也有弱点。在本文中，我们提出了两种新的方法来建模特征之间的相关性。这两种方法都基于vine copula，这是一种灵活的工具，用于建模能够表征广泛复杂依赖关系的多元非高斯分布。在模拟数据集和真实数据集上对所提出的方法的性能进行了评估。实验表明，与竞争对手相比，vine copula方法对真实的Shapley值给出了更准确的近似值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Explaining predictive models using Shapley values and non-parametric vine copulas

Abstract In this paper the goal is to explain predictions from complex machine learning models. One method that has become very popular during the last few years is Shapley values. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the previously proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than their competitors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Dependence Modeling STATISTICS & PROBABILITY-

CiteScore

1.00

自引率

0.00%

发文量

审稿时长

12 weeks

期刊介绍： The journal Dependence Modeling aims at providing a medium for exchanging results and ideas in the area of multivariate dependence modeling. It is an open access fully peer-reviewed journal providing the readers with free, instant, and permanent access to all content worldwide. Dependence Modeling is listed by Web of Science (Emerging Sources Citation Index), Scopus, MathSciNet and Zentralblatt Math. The journal presents different types of articles: -"Research Articles" on fundamental theoretical aspects, as well as on significant applications in science, engineering, economics, finance, insurance and other fields. -"Review Articles" which present the existing literature on the specific topic from new perspectives. -"Interview articles" limited to two papers per year, covering interviews with milestone personalities in the field of Dependence Modeling. The journal topics include (but are not limited to):　 -Copula methods -Multivariate distributions -Estimation and goodness-of-fit tests -Measures of association -Quantitative risk management -Risk measures and stochastic orders -Time series -Environmental sciences -Computational methods and software -Extreme-value theory -Limit laws -Mass Transportations