{"title":"Finding Key Training Data by Calculating Influence Score","authors":"Jiahao Xu, Fan Zhang, S. Khan","doi":"10.1145/3565387.3565403","DOIUrl":null,"url":null,"abstract":"Due to the complexity and opacity of decision models and increasing data volume requirements, this makes it more attractive to reduce data volume and improve model interpretability by selecting key data. In this paper, we propose an influence function-based method InfSort for data sorting and pruning, and demonstrate that the key data selected by this method outperforms an equal number of other data. In addition, we also found that the importance of the data is positively correlated with the speed and stability of the loss, and the key data is more conducive to speeding up the model convergence. We also developed a method CGT that prevents the risk of overfitting by controlling for the worst case distribution of the data. Experimental results show that our method is effective and efficient in emotion recognition tasks.","PeriodicalId":182491,"journal":{"name":"Proceedings of the 6th International Conference on Computer Science and Application Engineering","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Computer Science and Application Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3565387.3565403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Due to the complexity and opacity of decision models and increasing data volume requirements, this makes it more attractive to reduce data volume and improve model interpretability by selecting key data. In this paper, we propose an influence function-based method InfSort for data sorting and pruning, and demonstrate that the key data selected by this method outperforms an equal number of other data. In addition, we also found that the importance of the data is positively correlated with the speed and stability of the loss, and the key data is more conducive to speeding up the model convergence. We also developed a method CGT that prevents the risk of overfitting by controlling for the worst case distribution of the data. Experimental results show that our method is effective and efficient in emotion recognition tasks.