A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering

Qin Liu, Shihai Shi, Hongming Zhu, Jiakai Xiao
{"title":"A Mutual Information-Based Hybrid Feature Selection Method for Software Cost Estimation Using Feature Clustering","authors":"Qin Liu, Shihai Shi, Hongming Zhu, Jiakai Xiao","doi":"10.1109/COMPSAC.2014.99","DOIUrl":null,"url":null,"abstract":"Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.","PeriodicalId":106871,"journal":{"name":"2014 IEEE 38th Annual Computer Software and Applications Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 38th Annual Computer Software and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC.2014.99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Feature selection methods are designed to obtain the optimal feature subset from the original features to give the most accurate prediction. So far, supervised and unsupervised feature selection methods have been discussed and developed separately. However, these two methods can be combined together as a hybrid feature selection method for some data sets. In this paper, we propose a mutual information-based (MI-based) hybrid feature selection method using feature clustering. In the unsupervised learning stage, the original features are grouped into several clusters based on the feature similarity to each other with agglomerative hierarchical clustering. Then in the supervised learning stage, the feature in each cluster that can maximize the feature similarity with the response feature which represents the class label is selected as the representative feature. These representative features compose the feature subset. Our contribution includes 1)the newly proposed feature selection method and 2)the application of feature clustering for software cost estimation. The proposed method employs wrapper approaches, so it can evaluate the prediction performance of each feature subset to determine the optimal one. The experimental results in software cost estimation demonstrate that the proposed method can outperform at least 11.5% and 14.8% than the supervised feature selection method INMIFS and mRMRFS in ISBSG R8 and Desharnais data set in terms of PRED (0.25) value.
基于特征聚类的基于互信息的混合特征选择方法
特征选择方法旨在从原始特征中获得最优的特征子集,从而给出最准确的预测。到目前为止,有监督和无监督的特征选择方法是分开讨论和发展的。然而,对于某些数据集,这两种方法可以结合起来作为混合特征选择方法。本文提出了一种基于互信息(MI-based)的混合特征选择方法。在无监督学习阶段,采用聚类分层聚类的方法,根据特征之间的相似性将原始特征分组成若干个聚类。然后在监督学习阶段,在每个聚类中选择与代表类标号的响应特征相似度最大的特征作为代表特征。这些代表性的特征组成了特征子集。我们的贡献包括1)新提出的特征选择方法和2)特征聚类在软件成本估计中的应用。该方法采用包装器方法,可以对每个特征子集的预测性能进行评估,从而确定最优特征子集。软件成本估计的实验结果表明,该方法在ISBSG R8和Desharnais数据集上的PRED(0.25)值比有监督特征选择方法INMIFS和mRMRFS分别高出11.5%和14.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信