基于等级和基于分数的集合基因选择聚合的比较

D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano
{"title":"基于等级和基于分数的集合基因选择聚合的比较","authors":"D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano","doi":"10.1109/IRI.2013.6642476","DOIUrl":null,"url":null,"abstract":"Gene selection is an essential step in much bioinformatics research in order to handle the thousands or tens of thousands of gene expression levels generated by gene microarrays. It is especially important that this gene selection is robust and will produce consistent results even in the face of changes to the dataset. Ensemble gene selection can help improve robustness, by combining gene rankings from multiple gene selection techniques into a single gene subset. Typically this is performed by performing multiple runs of feature (gene) selection, finding each gene's rank within the different runs, and aggregating these ranks into a final ranked list. However, another option exists: instead of performing the ranking on each list and then aggregating, the raw scores produced by the gene ranking algorithms (which would normally be compared to generate a ranking) are aggregated directly, and these aggregate scores are used to create a final ranking. This potentially results in a different final ranking, since adjacent genes (e.g., those with no genes in between them) which are particularly close to or far from one another will be treated as such. Also, score aggregation can help reduce computation time due to the ranking step only taking place once, rather than separately for each list being aggregated. In this experiment, we use eleven DNA microarray datasets and nine univariate feature selection techniques, along with twelve feature subset sizes, to demonstrate these two approaches on a commonly used aggregation technique: mean aggregation. The results show that for seven of the nine feature selection techniques, we see strong similarity between the two approaches, but the feature subsets are not identical. However, two of the techniques do show high levels of diversity between the two approaches. This allows us to state that further research is required in order to determine the abilities of the two approaches.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"499 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Comparison of rank-based vs. score-based aggregation for ensemble gene selection\",\"authors\":\"D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano\",\"doi\":\"10.1109/IRI.2013.6642476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gene selection is an essential step in much bioinformatics research in order to handle the thousands or tens of thousands of gene expression levels generated by gene microarrays. It is especially important that this gene selection is robust and will produce consistent results even in the face of changes to the dataset. Ensemble gene selection can help improve robustness, by combining gene rankings from multiple gene selection techniques into a single gene subset. Typically this is performed by performing multiple runs of feature (gene) selection, finding each gene's rank within the different runs, and aggregating these ranks into a final ranked list. However, another option exists: instead of performing the ranking on each list and then aggregating, the raw scores produced by the gene ranking algorithms (which would normally be compared to generate a ranking) are aggregated directly, and these aggregate scores are used to create a final ranking. This potentially results in a different final ranking, since adjacent genes (e.g., those with no genes in between them) which are particularly close to or far from one another will be treated as such. Also, score aggregation can help reduce computation time due to the ranking step only taking place once, rather than separately for each list being aggregated. In this experiment, we use eleven DNA microarray datasets and nine univariate feature selection techniques, along with twelve feature subset sizes, to demonstrate these two approaches on a commonly used aggregation technique: mean aggregation. The results show that for seven of the nine feature selection techniques, we see strong similarity between the two approaches, but the feature subsets are not identical. However, two of the techniques do show high levels of diversity between the two approaches. This allows us to state that further research is required in order to determine the abilities of the two approaches.\",\"PeriodicalId\":418492,\"journal\":{\"name\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"volume\":\"499 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2013.6642476\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

基因选择是许多生物信息学研究中必不可少的一步,以便处理由基因微阵列产生的成千上万个基因表达水平。尤其重要的是,这种基因选择是稳健的,即使面对数据集的变化,也会产生一致的结果。通过将多个基因选择技术中的基因排序组合成单个基因子集,集合基因选择可以帮助提高健壮性。这通常是通过执行特征(基因)选择的多次运行来完成的,找到每个基因在不同运行中的排名,并将这些排名汇总到最终的排名列表中。然而,存在另一种选择:不是对每个列表执行排名,然后进行汇总,而是直接汇总由基因排名算法产生的原始分数(通常将其与生成排名进行比较),并使用这些汇总分数来创建最终排名。这可能会导致不同的最终排名,因为相邻基因(例如,它们之间没有基因)彼此特别接近或远离的基因将被视为如此。此外,分数聚合可以帮助减少计算时间,因为排序步骤只发生一次,而不是对每个聚合的列表分别进行排序。在本实验中,我们使用11个DNA微阵列数据集和9个单变量特征选择技术,以及12个特征子集大小,来演示这两种方法在常用的聚合技术:平均聚合上的应用。结果表明,对于9种特征选择技术中的7种,我们看到两种方法之间有很强的相似性,但特征子集并不相同。然而,其中两种技术确实显示出两种方法之间的高度多样性。这使我们能够声明,为了确定这两种方法的能力,需要进一步的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of rank-based vs. score-based aggregation for ensemble gene selection
Gene selection is an essential step in much bioinformatics research in order to handle the thousands or tens of thousands of gene expression levels generated by gene microarrays. It is especially important that this gene selection is robust and will produce consistent results even in the face of changes to the dataset. Ensemble gene selection can help improve robustness, by combining gene rankings from multiple gene selection techniques into a single gene subset. Typically this is performed by performing multiple runs of feature (gene) selection, finding each gene's rank within the different runs, and aggregating these ranks into a final ranked list. However, another option exists: instead of performing the ranking on each list and then aggregating, the raw scores produced by the gene ranking algorithms (which would normally be compared to generate a ranking) are aggregated directly, and these aggregate scores are used to create a final ranking. This potentially results in a different final ranking, since adjacent genes (e.g., those with no genes in between them) which are particularly close to or far from one another will be treated as such. Also, score aggregation can help reduce computation time due to the ranking step only taking place once, rather than separately for each list being aggregated. In this experiment, we use eleven DNA microarray datasets and nine univariate feature selection techniques, along with twelve feature subset sizes, to demonstrate these two approaches on a commonly used aggregation technique: mean aggregation. The results show that for seven of the nine feature selection techniques, we see strong similarity between the two approaches, but the feature subsets are not identical. However, two of the techniques do show high levels of diversity between the two approaches. This allows us to state that further research is required in order to determine the abilities of the two approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信