{"title":"A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering","authors":"Qin Liu, Shihai Shi, Hongming Zhu, Jiakai Xiao","doi":"10.1109/COMPSAC.2014.99","DOIUrl":null,"url":null,"abstract":"Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.","PeriodicalId":106871,"journal":{"name":"2014 IEEE 38th Annual Computer Software and Applications Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 38th Annual Computer Software and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC.2014.99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.