{"title":"基于SMOTE的非平衡数据集过采样的可解释性","authors":"Aum Patil, Aman Framewala, F. Kazi","doi":"10.1109/ICICT50521.2020.00015","DOIUrl":null,"url":null,"abstract":"Since the advent of Artificial Intelligence (AI), the problem of imbalanced datasets and the lack of interpretability of complex AI models has been a matter of concern for the research community. These datasets contain a very low proportion of one class (minority class) and very large proportion of another class (majority class). Even though the quantitative representation is less for minority class they have high qualitative importance as the cost associated in case of misclassification in these domains is very high. The paper presents a novel solution to deal with the issue of imbalanced dataset by using the proven method of resampling Synthetic Minority Oversampling Technique (SMOTE). Further, the interpretability of such an approach is demonstrated by some powerful eXplainable AI (XAI) techniques such as LRP, SHAP and LIME. In this paper state-of-art models like Deep Learning and Boosting classifiers were trained to classify fraud instances with high accuracy and proved to be reliable by producing explanations for their predicted instances. The results of confusion matrices and explanations showcase excellent performance and reliability of the models.","PeriodicalId":445000,"journal":{"name":"2020 3rd International Conference on Information and Computer Technologies (ICICT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Explainability of SMOTE Based Oversampling for Imbalanced Dataset Problems\",\"authors\":\"Aum Patil, Aman Framewala, F. Kazi\",\"doi\":\"10.1109/ICICT50521.2020.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since the advent of Artificial Intelligence (AI), the problem of imbalanced datasets and the lack of interpretability of complex AI models has been a matter of concern for the research community. These datasets contain a very low proportion of one class (minority class) and very large proportion of another class (majority class). Even though the quantitative representation is less for minority class they have high qualitative importance as the cost associated in case of misclassification in these domains is very high. The paper presents a novel solution to deal with the issue of imbalanced dataset by using the proven method of resampling Synthetic Minority Oversampling Technique (SMOTE). Further, the interpretability of such an approach is demonstrated by some powerful eXplainable AI (XAI) techniques such as LRP, SHAP and LIME. In this paper state-of-art models like Deep Learning and Boosting classifiers were trained to classify fraud instances with high accuracy and proved to be reliable by producing explanations for their predicted instances. The results of confusion matrices and explanations showcase excellent performance and reliability of the models.\",\"PeriodicalId\":445000,\"journal\":{\"name\":\"2020 3rd International Conference on Information and Computer Technologies (ICICT)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 3rd International Conference on Information and Computer Technologies (ICICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICT50521.2020.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Information and Computer Technologies (ICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT50521.2020.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Explainability of SMOTE Based Oversampling for Imbalanced Dataset Problems
Since the advent of Artificial Intelligence (AI), the problem of imbalanced datasets and the lack of interpretability of complex AI models has been a matter of concern for the research community. These datasets contain a very low proportion of one class (minority class) and very large proportion of another class (majority class). Even though the quantitative representation is less for minority class they have high qualitative importance as the cost associated in case of misclassification in these domains is very high. The paper presents a novel solution to deal with the issue of imbalanced dataset by using the proven method of resampling Synthetic Minority Oversampling Technique (SMOTE). Further, the interpretability of such an approach is demonstrated by some powerful eXplainable AI (XAI) techniques such as LRP, SHAP and LIME. In this paper state-of-art models like Deep Learning and Boosting classifiers were trained to classify fraud instances with high accuracy and proved to be reliable by producing explanations for their predicted instances. The results of confusion matrices and explanations showcase excellent performance and reliability of the models.