Classification and Identification of Differential Gene Expression for Microarray Data: Improvement of the Random Forest Method

Xiao-yan Wu, Zhenyu Wu, Kang Li
{"title":"Classification and Identification of Differential Gene Expression for Microarray Data: Improvement of the Random Forest Method","authors":"Xiao-yan Wu, Zhenyu Wu, Kang Li","doi":"10.1109/ICBBE.2008.186","DOIUrl":null,"url":null,"abstract":"Classification and gene selection of microarray data have been important aspects of the investigation of gene expression data in biomedical researches. The analysis of gene expression data presents a new challenge for statistical methods because of its high dimensionality. Random forest has been used to deal with the problem. We present a new classifier named Recursive Random Forest which selects genes automatically and improves the accuracy of classification based on random forest. Three microarray datasets (ALL-AML Leukemia data, Colon Cancer data and Prostate cancer data) were analyzed using Recursive Random Forest. Although the genes selected from the microarray data were only a few, they were effective on cancer prediction and their biological functions have been confirmed. Especially on the ALL-AML Leukemia data, it achieved a perfect accuracy on the test set using only three genes (selected from over 7000). We also research the properties of random forest and recursive random forest on simulated experiments. Recursive random forest provides more useful information than simply using random forest for the further biological experiment, clinical diagnoses and disease therapies because of its function of gene selection, which would probably become an excellent 'tool' on sample classification and gene selection for microarray data. Source code written in R for Recursive Random Forest is available from http://vxzv.hrbmu.edu.cn/gongwei/biostatistics/.","PeriodicalId":6399,"journal":{"name":"2008 2nd International Conference on Bioinformatics and Biomedical Engineering","volume":"7 1","pages":"763-766"},"PeriodicalIF":0.0000,"publicationDate":"2008-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 2nd International Conference on Bioinformatics and Biomedical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBBE.2008.186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Classification and gene selection of microarray data have been important aspects of the investigation of gene expression data in biomedical researches. The analysis of gene expression data presents a new challenge for statistical methods because of its high dimensionality. Random forest has been used to deal with the problem. We present a new classifier named Recursive Random Forest which selects genes automatically and improves the accuracy of classification based on random forest. Three microarray datasets (ALL-AML Leukemia data, Colon Cancer data and Prostate cancer data) were analyzed using Recursive Random Forest. Although the genes selected from the microarray data were only a few, they were effective on cancer prediction and their biological functions have been confirmed. Especially on the ALL-AML Leukemia data, it achieved a perfect accuracy on the test set using only three genes (selected from over 7000). We also research the properties of random forest and recursive random forest on simulated experiments. Recursive random forest provides more useful information than simply using random forest for the further biological experiment, clinical diagnoses and disease therapies because of its function of gene selection, which would probably become an excellent 'tool' on sample classification and gene selection for microarray data. Source code written in R for Recursive Random Forest is available from http://vxzv.hrbmu.edu.cn/gongwei/biostatistics/.
微阵列数据中差异基因表达的分类与鉴定:随机森林方法的改进
基因芯片数据的分类和基因选择一直是生物医学研究中基因表达数据研究的重要方面。基因表达数据的分析由于其高维性对统计方法提出了新的挑战。随机森林被用来解决这个问题。提出了一种自动选择基因的递归随机森林分类器,提高了基于随机森林的分类精度。使用递归随机森林分析三个微阵列数据集(ALL-AML白血病数据、结肠癌数据和前列腺癌数据)。虽然从微阵列数据中选择的基因很少,但它们对癌症的预测是有效的,其生物学功能已经得到证实。特别是在ALL-AML白血病数据上,它在仅使用三个基因(从7000多个基因中选择)的测试集上取得了完美的准确性。通过模拟实验研究了随机森林和递归随机森林的性质。递归随机森林具有基因选择的功能,为进一步的生物学实验、临床诊断和疾病治疗提供了比简单使用随机森林更有用的信息,可能成为微阵列数据样本分类和基因选择的优秀“工具”。递归随机森林用R语言编写的源代码可从http://vxzv.hrbmu.edu.cn/gongwei/biostatistics/获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信