The Comparison of Feature Selection Methods in Software Defect Prediction

2020 4th International Conference on Informatics and Computational Sciences (ICICoS) Pub Date : 2020-11-10 DOI:10.1109/ICICoS51170.2020.9299022

Khadijah, Amazona Adorada, P. W. Wirawan, Kabul Kurniawan

{"title":"The Comparison of Feature Selection Methods in Software Defect Prediction","authors":"Khadijah, Amazona Adorada, P. W. Wirawan, Kabul Kurniawan","doi":"10.1109/ICICoS51170.2020.9299022","DOIUrl":null,"url":null,"abstract":"One of the goal in software testing is to discover software defects before the software is used by customer. Successful software testing leads to high quality software. However, exposing a defect in software testing is very resources consuming. Therefore, an automated software defect prediction is needed. In order to build accurate model for prediction, a relevant subset of features must be carefully determined as an input to the classifier. Therefore, this research compares the performance of feature selection method between a kind of filter method, namely ReliefF and a kind of embedded method, namely SVM-RFE (Support Vector Machine – Recursive Feature Elimination). Those methods are free from the assumption of conditional independence among features. Then, SVM is applied as classification algorithm. Previously, SMOTE (Synthetic Minority Oversampling Technique) is used to balance the training data. The experiments are run on benchmark public dataset, NASA MDP dataset. The experiment results show that SVM-RFE perform better than ReliefF in term of g-mean, while ReliefF perform better SVM-RFE in term of accuracy. However, when using SVM-RFE feature selection, the best classifier performance can be achieved with smaller number of features as compared to ReliefF. Future research may explore ensemble feature selection method as an attempt to improve performance of the resulting classifier, both in g-mean and accuracy.","PeriodicalId":122803,"journal":{"name":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS51170.2020.9299022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

One of the goal in software testing is to discover software defects before the software is used by customer. Successful software testing leads to high quality software. However, exposing a defect in software testing is very resources consuming. Therefore, an automated software defect prediction is needed. In order to build accurate model for prediction, a relevant subset of features must be carefully determined as an input to the classifier. Therefore, this research compares the performance of feature selection method between a kind of filter method, namely ReliefF and a kind of embedded method, namely SVM-RFE (Support Vector Machine – Recursive Feature Elimination). Those methods are free from the assumption of conditional independence among features. Then, SVM is applied as classification algorithm. Previously, SMOTE (Synthetic Minority Oversampling Technique) is used to balance the training data. The experiments are run on benchmark public dataset, NASA MDP dataset. The experiment results show that SVM-RFE perform better than ReliefF in term of g-mean, while ReliefF perform better SVM-RFE in term of accuracy. However, when using SVM-RFE feature selection, the best classifier performance can be achieved with smaller number of features as compared to ReliefF. Future research may explore ensemble feature selection method as an attempt to improve performance of the resulting classifier, both in g-mean and accuracy.

查看原文本刊更多论文

软件缺陷预测中特征选择方法的比较

软件测试的目标之一是在客户使用软件之前发现软件缺陷。成功的软件测试可以带来高质量的软件。然而，在软件测试中暴露缺陷是非常消耗资源的。因此，需要一个自动化的软件缺陷预测。为了建立准确的预测模型，必须仔细确定相关的特征子集作为分类器的输入。因此，本研究比较了一种滤波方法ReliefF和一种嵌入方法SVM-RFE (Support Vector Machine - Recursive feature Elimination，支持向量机递归特征消除)的特征选择方法的性能。这些方法不需要假设特征之间的条件独立。然后，将支持向量机作为分类算法。以前，使用SMOTE(合成少数派过采样技术)来平衡训练数据。实验在基准公共数据集、NASA MDP数据集上进行。实验结果表明，SVM-RFE在g均值方面优于ReliefF，而ReliefF在准确率方面优于SVM-RFE。然而，当使用SVM-RFE特征选择时，与ReliefF相比，使用较少的特征数量可以获得最佳的分类器性能。未来的研究可能会探索集成特征选择方法，以尝试提高最终分类器的性能，包括g均值和准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 4th International Conference on Informatics and Computational Sciences (ICICoS)

自引率

0.00%

发文量