Minimum redundancy feature selection from microarray gene expression data

Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003 Pub Date : 2003-08-11 DOI:10.1109/CSB.2003.1227396

C. Ding, Hanchuan Peng

引用次数: 2467

Abstract

Selecting a small subset of genes out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. Feature sets obtained through the minimum redundancy - maximum relevance framework represent broader spectrum of characteristics of phenotypes than those obtained through standard ranking methods; they are more robust, generalize well to unseen data, and lead to significantly improved classifications in extensive experiments on 5 gene expressions data sets.

查看原文本刊更多论文

微阵列基因表达数据的最小冗余特征选择

从微阵列数据中的数千个基因中选择一小部分基因对于表型的准确分类很重要。广泛使用的方法通常是根据基因在表型中的差异表达对基因进行排序，并选择排名靠前的基因。我们观察到这样得到的特征集具有一定的冗余，并研究了最小化冗余的方法。通过最小冗余-最大相关性框架获得的特征集比通过标准排序方法获得的特征集代表更广泛的表型特征;它们具有更强的鲁棒性，可以很好地推广到未见过的数据，并在5个基因表达数据集的广泛实验中显著改进了分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003

自引率

0.00%

发文量