Rizal Broer Bahaweres, Mutia Salsabila, Nurul Faizah Rozy, I. Hermadi, A. Suroso, Y. Arkeman
{"title":"结合PCA和SMOTE的软件缺陷预测与可视化分析方法","authors":"Rizal Broer Bahaweres, Mutia Salsabila, Nurul Faizah Rozy, I. Hermadi, A. Suroso, Y. Arkeman","doi":"10.1109/CITSM56380.2022.9935831","DOIUrl":null,"url":null,"abstract":"Software defect prediction enables efficient management of time and resources in the form of improving software quality. Therefore, research to improve the performance or accuracy score of the software defect prediction model is still being carried out. However, datasets for SDP often have a large number of attributes and imbalance between the defective and non-defective class samples, which reduces classification performance. In this study, we propose combining PCA with SMOTE with aims to produce models with better performance and visualization approach to represent the model created to help understanding and analysis for modeling in the future. The SVM, RF, NB, and NN classification algorithms which the best parameters are sought, are evaluated based on the Recall, AUC and G-Mean values in five different NASA datasets. The authors then compare the results of the evaluation of the proposed model with the PCA model without SMOTE to find out whether the performance of the model has improved. Visual analytics is successfully built after the model is created for all stages of the model building so it provides confidence, helps users understand and gain insights from the resulting model. The findings indicate that the proposed method outperforms the model using PCA alone on average by 60%, 47%, and 16% for Recall, AUC, and G-Mean scores, respectively. SMOTE is proven to overcome the effect of class imbalance by increasing the g-mean score in all models and NN is the best algorithm based on the average score in the proposed model.","PeriodicalId":342813,"journal":{"name":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Combining PCA and SMOTE for Software Defect Prediction with Visual Analytics Approach\",\"authors\":\"Rizal Broer Bahaweres, Mutia Salsabila, Nurul Faizah Rozy, I. Hermadi, A. Suroso, Y. Arkeman\",\"doi\":\"10.1109/CITSM56380.2022.9935831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software defect prediction enables efficient management of time and resources in the form of improving software quality. Therefore, research to improve the performance or accuracy score of the software defect prediction model is still being carried out. However, datasets for SDP often have a large number of attributes and imbalance between the defective and non-defective class samples, which reduces classification performance. In this study, we propose combining PCA with SMOTE with aims to produce models with better performance and visualization approach to represent the model created to help understanding and analysis for modeling in the future. The SVM, RF, NB, and NN classification algorithms which the best parameters are sought, are evaluated based on the Recall, AUC and G-Mean values in five different NASA datasets. The authors then compare the results of the evaluation of the proposed model with the PCA model without SMOTE to find out whether the performance of the model has improved. Visual analytics is successfully built after the model is created for all stages of the model building so it provides confidence, helps users understand and gain insights from the resulting model. The findings indicate that the proposed method outperforms the model using PCA alone on average by 60%, 47%, and 16% for Recall, AUC, and G-Mean scores, respectively. SMOTE is proven to overcome the effect of class imbalance by increasing the g-mean score in all models and NN is the best algorithm based on the average score in the proposed model.\",\"PeriodicalId\":342813,\"journal\":{\"name\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 10th International Conference on Cyber and IT Service Management (CITSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CITSM56380.2022.9935831\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 10th International Conference on Cyber and IT Service Management (CITSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CITSM56380.2022.9935831","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Combining PCA and SMOTE for Software Defect Prediction with Visual Analytics Approach
Software defect prediction enables efficient management of time and resources in the form of improving software quality. Therefore, research to improve the performance or accuracy score of the software defect prediction model is still being carried out. However, datasets for SDP often have a large number of attributes and imbalance between the defective and non-defective class samples, which reduces classification performance. In this study, we propose combining PCA with SMOTE with aims to produce models with better performance and visualization approach to represent the model created to help understanding and analysis for modeling in the future. The SVM, RF, NB, and NN classification algorithms which the best parameters are sought, are evaluated based on the Recall, AUC and G-Mean values in five different NASA datasets. The authors then compare the results of the evaluation of the proposed model with the PCA model without SMOTE to find out whether the performance of the model has improved. Visual analytics is successfully built after the model is created for all stages of the model building so it provides confidence, helps users understand and gain insights from the resulting model. The findings indicate that the proposed method outperforms the model using PCA alone on average by 60%, 47%, and 16% for Recall, AUC, and G-Mean scores, respectively. SMOTE is proven to overcome the effect of class imbalance by increasing the g-mean score in all models and NN is the best algorithm based on the average score in the proposed model.