基于等级和基于分数的集合基因选择聚合的比较

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) Pub Date : 2013-10-24 DOI:10.1109/IRI.2013.6642476

D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano

{"title":"基于等级和基于分数的集合基因选择聚合的比较","authors":"D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano","doi":"10.1109/IRI.2013.6642476","DOIUrl":null,"url":null,"abstract":"Gene selection is an essential step in much bioinformatics research in order to handle the thousands or tens of thousands of gene expression levels generated by gene microarrays. It is especially important that this gene selection is robust and will produce consistent results even in the face of changes to the dataset. Ensemble gene selection can help improve robustness, by combining gene rankings from multiple gene selection techniques into a single gene subset. Typically this is performed by performing multiple runs of feature (gene) selection, finding each gene's rank within the different runs, and aggregating these ranks into a final ranked list. However, another option exists: instead of performing the ranking on each list and then aggregating, the raw scores produced by the gene ranking algorithms (which would normally be compared to generate a ranking) are aggregated directly, and these aggregate scores are used to create a final ranking. This potentially results in a different final ranking, since adjacent genes (e.g., those with no genes in between them) which are particularly close to or far from one another will be treated as such. Also, score aggregation can help reduce computation time due to the ranking step only taking place once, rather than separately for each list being aggregated. In this experiment, we use eleven DNA microarray datasets and nine univariate feature selection techniques, along with twelve feature subset sizes, to demonstrate these two approaches on a commonly used aggregation technique: mean aggregation. The results show that for seven of the nine feature selection techniques, we see strong similarity between the two approaches, but the feature subsets are not identical. However, two of the techniques do show high levels of diversity between the two approaches. This allows us to state that further research is required in order to determine the abilities of the two approaches.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"499 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Comparison of rank-based vs. score-based aggregation for ensemble gene selection\",\"authors\":\"D. Dittman, T. Khoshgoftaar, Randall Wald, Amri Napolitano\",\"doi\":\"10.1109/IRI.2013.6642476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gene selection is an essential step in much bioinformatics research in order to handle the thousands or tens of thousands of gene expression levels generated by gene microarrays. It is especially important that this gene selection is robust and will produce consistent results even in the face of changes to the dataset. Ensemble gene selection can help improve robustness, by combining gene rankings from multiple gene selection techniques into a single gene subset. Typically this is performed by performing multiple runs of feature (gene) selection, finding each gene's rank within the different runs, and aggregating these ranks into a final ranked list. However, another option exists: instead of performing the ranking on each list and then aggregating, the raw scores produced by the gene ranking algorithms (which would normally be compared to generate a ranking) are aggregated directly, and these aggregate scores are used to create a final ranking. This potentially results in a different final ranking, since adjacent genes (e.g., those with no genes in between them) which are particularly close to or far from one another will be treated as such. Also, score aggregation can help reduce computation time due to the ranking step only taking place once, rather than separately for each list being aggregated. In this experiment, we use eleven DNA microarray datasets and nine univariate feature selection techniques, along with twelve feature subset sizes, to demonstrate these two approaches on a commonly used aggregation technique: mean aggregation. The results show that for seven of the nine feature selection techniques, we see strong similarity between the two approaches, but the feature subsets are not identical. However, two of the techniques do show high levels of diversity between the two approaches. This allows us to state that further research is required in order to determine the abilities of the two approaches.\",\"PeriodicalId\":418492,\"journal\":{\"name\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"volume\":\"499 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2013.6642476\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

基因选择是许多生物信息学研究中必不可少的一步，以便处理由基因微阵列产生的成千上万个基因表达水平。尤其重要的是，这种基因选择是稳健的，即使面对数据集的变化，也会产生一致的结果。通过将多个基因选择技术中的基因排序组合成单个基因子集，集合基因选择可以帮助提高健壮性。这通常是通过执行特征(基因)选择的多次运行来完成的，找到每个基因在不同运行中的排名，并将这些排名汇总到最终的排名列表中。然而，存在另一种选择:不是对每个列表执行排名，然后进行汇总，而是直接汇总由基因排名算法产生的原始分数(通常将其与生成排名进行比较)，并使用这些汇总分数来创建最终排名。这可能会导致不同的最终排名，因为相邻基因(例如，它们之间没有基因)彼此特别接近或远离的基因将被视为如此。此外，分数聚合可以帮助减少计算时间，因为排序步骤只发生一次，而不是对每个聚合的列表分别进行排序。在本实验中，我们使用11个DNA微阵列数据集和9个单变量特征选择技术，以及12个特征子集大小，来演示这两种方法在常用的聚合技术:平均聚合上的应用。结果表明，对于9种特征选择技术中的7种，我们看到两种方法之间有很强的相似性，但特征子集并不相同。然而，其中两种技术确实显示出两种方法之间的高度多样性。这使我们能够声明，为了确定这两种方法的能力，需要进一步的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of rank-based vs. score-based aggregation for ensemble gene selection

Gene selection is an essential step in much bioinformatics research in order to handle the thousands or tens of thousands of gene expression levels generated by gene microarrays. It is especially important that this gene selection is robust and will produce consistent results even in the face of changes to the dataset. Ensemble gene selection can help improve robustness, by combining gene rankings from multiple gene selection techniques into a single gene subset. Typically this is performed by performing multiple runs of feature (gene) selection, finding each gene's rank within the different runs, and aggregating these ranks into a final ranked list. However, another option exists: instead of performing the ranking on each list and then aggregating, the raw scores produced by the gene ranking algorithms (which would normally be compared to generate a ranking) are aggregated directly, and these aggregate scores are used to create a final ranking. This potentially results in a different final ranking, since adjacent genes (e.g., those with no genes in between them) which are particularly close to or far from one another will be treated as such. Also, score aggregation can help reduce computation time due to the ranking step only taking place once, rather than separately for each list being aggregated. In this experiment, we use eleven DNA microarray datasets and nine univariate feature selection techniques, along with twelve feature subset sizes, to demonstrate these two approaches on a commonly used aggregation technique: mean aggregation. The results show that for seven of the nine feature selection techniques, we see strong similarity between the two approaches, but the feature subsets are not identical. However, two of the techniques do show high levels of diversity between the two approaches. This allows us to state that further research is required in order to determine the abilities of the two approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)

自引率

0.00%

发文量