Comparing Two New Gene Selection Ensemble Approaches with the Commonly-Used Approach

D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano
{"title":"Comparing Two New Gene Selection Ensemble Approaches with the Commonly-Used Approach","authors":"D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano","doi":"10.1109/ICMLA.2012.175","DOIUrl":null,"url":null,"abstract":"Ensemble feature selection has recently become a topic of interest for researchers, especially in the area of bioinformatics. The benefits of ensemble feature selection include increased feature (gene) subset stability and usefulness as well as comparable (or better) classification performance compared to using a single feature selection method. However, existing work on ensemble feature selection has concentrated on data diversity (using a single feature selection method on multiple datasets or sampled data from a single dataset), neglecting two other potential sources of diversity. We present these two new approaches for gene selection, functional diversity (using multiple feature selection technique on a single dataset) and hybrid (a combination of data and functional diversity). To demonstrate the value of these new approaches, we measure the similarity between the feature subsets created by each of the three approaches across twenty-six datasets and ten feature selection techniques (or an ensemble of these techniques as appropriate). We also compare the classification performance of models built using each of the three ensembles. Our results show that the similarity between the functional diversity and hybrid approaches is much higher than the similarity between either of those and data diversity, with the distinction between data diversity and our new approaches being particularly strong for hard-to-learn datasets. In addition to having the highest similarity, functional and hybrid diversity generally show greater classification performance than data diversity, especially when selecting small feature subsets. These results demonstrate that these new approaches can both provide a different feature subset than the existing approach and that the resulting novel feature subset is potentially of interest to researchers. To our knowledge there has been no study which explores these new approaches to ensemble feature selection within the domain of bioinformatics.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 11th International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2012.175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

Ensemble feature selection has recently become a topic of interest for researchers, especially in the area of bioinformatics. The benefits of ensemble feature selection include increased feature (gene) subset stability and usefulness as well as comparable (or better) classification performance compared to using a single feature selection method. However, existing work on ensemble feature selection has concentrated on data diversity (using a single feature selection method on multiple datasets or sampled data from a single dataset), neglecting two other potential sources of diversity. We present these two new approaches for gene selection, functional diversity (using multiple feature selection technique on a single dataset) and hybrid (a combination of data and functional diversity). To demonstrate the value of these new approaches, we measure the similarity between the feature subsets created by each of the three approaches across twenty-six datasets and ten feature selection techniques (or an ensemble of these techniques as appropriate). We also compare the classification performance of models built using each of the three ensembles. Our results show that the similarity between the functional diversity and hybrid approaches is much higher than the similarity between either of those and data diversity, with the distinction between data diversity and our new approaches being particularly strong for hard-to-learn datasets. In addition to having the highest similarity, functional and hybrid diversity generally show greater classification performance than data diversity, especially when selecting small feature subsets. These results demonstrate that these new approaches can both provide a different feature subset than the existing approach and that the resulting novel feature subset is potentially of interest to researchers. To our knowledge there has been no study which explores these new approaches to ensemble feature selection within the domain of bioinformatics.
两种新的基因选择集成方法与常用方法的比较
近年来,集成特征选择已成为研究人员感兴趣的话题,特别是在生物信息学领域。与使用单一特征选择方法相比,集成特征选择的好处包括增加特征(基因)子集的稳定性和有用性,以及可比较(或更好)的分类性能。然而,现有的集成特征选择工作主要集中在数据多样性上(对多个数据集或单个数据集的采样数据使用单一特征选择方法),而忽略了其他两个潜在的多样性来源。我们提出了两种新的基因选择方法,功能多样性(在单个数据集上使用多特征选择技术)和杂交(数据和功能多样性的结合)。为了证明这些新方法的价值,我们测量了这三种方法在26个数据集和10种特征选择技术(或适当的这些技术的集合)中创建的特征子集之间的相似性。我们还比较了使用这三种集成方法构建的模型的分类性能。我们的研究结果表明,功能多样性和混合方法之间的相似性远远高于它们与数据多样性之间的相似性,对于难以学习的数据集,数据多样性和我们的新方法之间的区别尤其明显。除了具有最高的相似度外,功能多样性和混合多样性通常比数据多样性表现出更高的分类性能,特别是在选择小特征子集时。这些结果表明,这些新方法可以提供与现有方法不同的特征子集,并且由此产生的新特征子集可能会引起研究人员的兴趣。据我们所知,在生物信息学领域还没有研究探索这些集成特征选择的新方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信