Tackling Feature Selection Problems with Genetic Algorithms in Software Defect Prediction for Optimization

2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS) Pub Date : 2020-11-19 DOI:10.1109/ICIMCIS51567.2020.9354282

Rizal Broer Bahaweres, Arif Imam Suroso, Alam Wahyu Hutomo, Indra Permana Solihin, I. Hermadi, Y. Arkeman

{"title":"Tackling Feature Selection Problems with Genetic Algorithms in Software Defect Prediction for Optimization","authors":"Rizal Broer Bahaweres, Arif Imam Suroso, Alam Wahyu Hutomo, Indra Permana Solihin, I. Hermadi, Y. Arkeman","doi":"10.1109/ICIMCIS51567.2020.9354282","DOIUrl":null,"url":null,"abstract":"Software defect prediction is a way to improve quality by finding and tracking defective modules in the software which helps reduce costs during the software testing process. The use of machine learning methods for predicting software defects can be applied to predict defects in each software module. However, basically the software defect prediction dataset has two problems, namely class imbalance with very few defective modules compared to non-defective modules and contains noisy attributes due to irrelevant features. With these two problems, it will result in overfitting and lead to biased classification results so that it will have an impact on significantly reducing the performance of the machine learning model. In this study, we propose the implementation of bagging techniques and genetic algorithms to improve the classification performance of machine learning models in predicting software defects based Logistic Regression, Naive Bayes, SVM, KNN, Decision Tree. Bagging techniques and Genetic algorithms are approaches that can handle two main problems in software defects prediction, each of which can handle the class imbalance and feature selection problem. We used 6 NASA Promise datasets to evaluate the classification performance results based on AUC and G-Means values. The results using 10 cross-validations show that the proposed method can improve classification performance when compared to the original algorithm. The Decision Tree shows the highest performance of the 3 datasets tested, with the highest value of 94.61 % on the KC4 dataset. We also compare GA performance with another natural algorithm, Particle Swarm Optimization (PSO). The results show that the performance of all machine learning models with GA can outperform the algorithms with PSO","PeriodicalId":441670,"journal":{"name":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIMCIS51567.2020.9354282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Software defect prediction is a way to improve quality by finding and tracking defective modules in the software which helps reduce costs during the software testing process. The use of machine learning methods for predicting software defects can be applied to predict defects in each software module. However, basically the software defect prediction dataset has two problems, namely class imbalance with very few defective modules compared to non-defective modules and contains noisy attributes due to irrelevant features. With these two problems, it will result in overfitting and lead to biased classification results so that it will have an impact on significantly reducing the performance of the machine learning model. In this study, we propose the implementation of bagging techniques and genetic algorithms to improve the classification performance of machine learning models in predicting software defects based Logistic Regression, Naive Bayes, SVM, KNN, Decision Tree. Bagging techniques and Genetic algorithms are approaches that can handle two main problems in software defects prediction, each of which can handle the class imbalance and feature selection problem. We used 6 NASA Promise datasets to evaluate the classification performance results based on AUC and G-Means values. The results using 10 cross-validations show that the proposed method can improve classification performance when compared to the original algorithm. The Decision Tree shows the highest performance of the 3 datasets tested, with the highest value of 94.61 % on the KC4 dataset. We also compare GA performance with another natural algorithm, Particle Swarm Optimization (PSO). The results show that the performance of all machine learning models with GA can outperform the algorithms with PSO

查看原文本刊更多论文

用遗传算法解决软件缺陷预测优化中的特征选择问题

软件缺陷预测是一种通过发现和跟踪软件中的缺陷模块来提高质量的方法，它有助于降低软件测试过程中的成本。预测软件缺陷的机器学习方法可以应用于预测每个软件模块中的缺陷。然而，软件缺陷预测数据集基本上存在两个问题，即类不平衡，缺陷模块与非缺陷模块相比很少，并且由于不相关的特征而包含有噪声的属性。有了这两个问题，就会导致过拟合，导致分类结果偏倚，从而对机器学习模型性能的显著降低产生影响。在这项研究中，我们提出了bagging技术和遗传算法的实现，以提高机器学习模型在预测软件缺陷方面的分类性能，这些模型基于逻辑回归、朴素贝叶斯、支持向量机、KNN和决策树。Bagging技术和遗传算法是解决软件缺陷预测中两个主要问题的方法，它们都能解决类不平衡和特征选择问题。我们使用6个NASA Promise数据集，基于AUC和G-Means值对分类性能结果进行评估。10次交叉验证的结果表明，与原算法相比，本文提出的方法可以提高分类性能。决策树在测试的3个数据集中表现出最高的性能，在KC4数据集上达到了94.61%的最高值。我们还比较了遗传算法与另一种自然算法粒子群优化(PSO)的性能。结果表明，采用遗传算法的所有机器学习模型的性能都优于采用粒子群算法的机器学习模型

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS)

自引率

0.00%

发文量