GroupAdaBoost: Accurate Prediction and Selection of Important Genes

Ipsj Digital Courier Pub Date : 2007-04-15 DOI:10.2197/IPSJDC.3.145

Takashi Takenouchi, M. Ushijima, S. Eguchi

引用次数: 4

Abstract

In this paper, we propose GroupAdaBoost which is a variant of AdaBoost for statistical pattern recognition. The objective of the proposed algorithm is to solve the “ p » n ”problem arisen in bioinformatics. In a microarray experiment, gene expressions are observed to extract any specific pattern of gene expressions related to a disease status. Typically, p is the number of investigated genes and n is number of individuals. The ordinary method for predicting the genetic causes of diseases is apt to over-learn from any particular training dataset because of the“ p » n ” problem. We observed that GroupAdaBoost gave a robust performance for cases of the excess number p of genes. In several real datasets which are publicly available from web-pages, we compared the analysis of results among the proposed method and others, and a small scale of simulation study to confirm the validity of the proposed method. Additionally the proposed method effectively worked for the identification of important genes.

查看原文本刊更多论文

GroupAdaBoost:准确预测和选择重要基因

在本文中，我们提出了GroupAdaBoost，它是AdaBoost的一个变体，用于统计模式识别。该算法的目标是解决生物信息学中出现的“p»n”问题。在微阵列实验中，观察基因表达以提取与疾病状态相关的任何特定基因表达模式。通常，p是研究基因的数量，n是个体的数量。由于“p»n”问题，预测疾病遗传原因的普通方法容易从任何特定的训练数据集中过度学习。我们观察到，GroupAdaBoost在基因数量超过p的情况下表现良好。在几个公开的网页上的真实数据集上，我们比较了所提出方法和其他方法的分析结果，并进行了小规模的模拟研究，以证实所提出方法的有效性。此外，该方法还能有效地识别重要基因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ipsj Digital Courier

自引率

0.00%

发文量