{"title":"Estimating Contribution of Training Datasets using Shapley Values in Data-scale for Visual Recognition","authors":"Takayuki Scmitsu, M. Nakamura, Shotaro Ishigami, Toru Aoki, Teng-Yok Lee, Yoshimi Isu","doi":"10.23919/MVA51890.2021.9511396","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a method to measure contributions of multiple datasets i.e. how much a specific dataset contributes to improve accuracy of the model. Our method is based on shapley value, of which purpose is to measure contribution by difference of the accuracy of the models. Unlike previous method, our method first converts the accuracy to data-scale measurements using fitted log curve. We calculate contributions in a fair way that each trials are evaluated not by its improvements of accuracy, but by the number of data needed to make the improvements. Our method can avoid overestimation of contributions in small data cases. To evaluate the proposed method, we trained models for Person Re-Identification tasks with combinations of datasets, and calculated contributions of each datasets. Results show that the proposed metrics can effectively reduce the over-estimations in small data cases, while the contributions maintain good properties such as local accuracy and additive law derived from shapley value definition. We also proposed normalization of shapley values in data-scale by its actual number of instances, which indicates intrinsic importance of a dataset per instance.","PeriodicalId":312481,"journal":{"name":"2021 17th International Conference on Machine Vision and Applications (MVA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 17th International Conference on Machine Vision and Applications (MVA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MVA51890.2021.9511396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we propose a method to measure contributions of multiple datasets i.e. how much a specific dataset contributes to improve accuracy of the model. Our method is based on shapley value, of which purpose is to measure contribution by difference of the accuracy of the models. Unlike previous method, our method first converts the accuracy to data-scale measurements using fitted log curve. We calculate contributions in a fair way that each trials are evaluated not by its improvements of accuracy, but by the number of data needed to make the improvements. Our method can avoid overestimation of contributions in small data cases. To evaluate the proposed method, we trained models for Person Re-Identification tasks with combinations of datasets, and calculated contributions of each datasets. Results show that the proposed metrics can effectively reduce the over-estimations in small data cases, while the contributions maintain good properties such as local accuracy and additive law derived from shapley value definition. We also proposed normalization of shapley values in data-scale by its actual number of instances, which indicates intrinsic importance of a dataset per instance.