{"title":"FECAR:软件缺陷预测的特征选择框架","authors":"Shulong Liu, Xiang Chen, Wangshu Liu, Jiaqiang Chen, Qing Gu, Daoxu Chen","doi":"10.1109/COMPSAC.2014.66","DOIUrl":null,"url":null,"abstract":"Software defect prediction can classify new software entities into either buggy or clean. However the effectiveness of existing methods is influenced by irrelevant and redundant features. In this paper, we propose a new feature selection framework FECAR using Feature Clustering And feature Ranking. This framework firstly partitions original features into k clusters based on FF-Correlation measure. Then it selects relevant features from each cluster based on FC-Relevance measure. In empirical study, we choose Symmetric Uncertainty as FF-Correlation measure, and choose Information Gain, Chi-Square, and Relief as three different FC-Relevance measures. Based on some real projects Eclipse and NASA, we implemented our framework and performed empirical studies to investigate the redundancy rate and the performance of the trained defect predictors. Final results verify the effectiveness of our proposed framework and further provide a guideline for achieving cost-effective feature selection when using our framework.","PeriodicalId":106871,"journal":{"name":"2014 IEEE 38th Annual Computer Software and Applications Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":"{\"title\":\"FECAR: A Feature Selection Framework for Software Defect Prediction\",\"authors\":\"Shulong Liu, Xiang Chen, Wangshu Liu, Jiaqiang Chen, Qing Gu, Daoxu Chen\",\"doi\":\"10.1109/COMPSAC.2014.66\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software defect prediction can classify new software entities into either buggy or clean. However the effectiveness of existing methods is influenced by irrelevant and redundant features. In this paper, we propose a new feature selection framework FECAR using Feature Clustering And feature Ranking. This framework firstly partitions original features into k clusters based on FF-Correlation measure. Then it selects relevant features from each cluster based on FC-Relevance measure. In empirical study, we choose Symmetric Uncertainty as FF-Correlation measure, and choose Information Gain, Chi-Square, and Relief as three different FC-Relevance measures. Based on some real projects Eclipse and NASA, we implemented our framework and performed empirical studies to investigate the redundancy rate and the performance of the trained defect predictors. Final results verify the effectiveness of our proposed framework and further provide a guideline for achieving cost-effective feature selection when using our framework.\",\"PeriodicalId\":106871,\"journal\":{\"name\":\"2014 IEEE 38th Annual Computer Software and Applications Conference\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"68\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 38th Annual Computer Software and Applications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSAC.2014.66\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 38th Annual Computer Software and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC.2014.66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FECAR: A Feature Selection Framework for Software Defect Prediction
Software defect prediction can classify new software entities into either buggy or clean. However the effectiveness of existing methods is influenced by irrelevant and redundant features. In this paper, we propose a new feature selection framework FECAR using Feature Clustering And feature Ranking. This framework firstly partitions original features into k clusters based on FF-Correlation measure. Then it selects relevant features from each cluster based on FC-Relevance measure. In empirical study, we choose Symmetric Uncertainty as FF-Correlation measure, and choose Information Gain, Chi-Square, and Relief as three different FC-Relevance measures. Based on some real projects Eclipse and NASA, we implemented our framework and performed empirical studies to investigate the redundancy rate and the performance of the trained defect predictors. Final results verify the effectiveness of our proposed framework and further provide a guideline for achieving cost-effective feature selection when using our framework.