{"title":"GroupAdaBoost: Accurate Prediction and Selection of Important Genes","authors":"Takashi Takenouchi, M. Ushijima, S. Eguchi","doi":"10.2197/IPSJDC.3.145","DOIUrl":null,"url":null,"abstract":"In this paper, we propose GroupAdaBoost which is a variant of AdaBoost for statistical pattern recognition. The objective of the proposed algorithm is to solve the “ p » n ”problem arisen in bioinformatics. In a microarray experiment, gene expressions are observed to extract any specific pattern of gene expressions related to a disease status. Typically, p is the number of investigated genes and n is number of individuals. The ordinary method for predicting the genetic causes of diseases is apt to over-learn from any particular training dataset because of the“ p » n ” problem. We observed that GroupAdaBoost gave a robust performance for cases of the excess number p of genes. In several real datasets which are publicly available from web-pages, we compared the analysis of results among the proposed method and others, and a small scale of simulation study to confirm the validity of the proposed method. Additionally the proposed method effectively worked for the identification of important genes.","PeriodicalId":432390,"journal":{"name":"Ipsj Digital Courier","volume":"250 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ipsj Digital Courier","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/IPSJDC.3.145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In this paper, we propose GroupAdaBoost which is a variant of AdaBoost for statistical pattern recognition. The objective of the proposed algorithm is to solve the “ p » n ”problem arisen in bioinformatics. In a microarray experiment, gene expressions are observed to extract any specific pattern of gene expressions related to a disease status. Typically, p is the number of investigated genes and n is number of individuals. The ordinary method for predicting the genetic causes of diseases is apt to over-learn from any particular training dataset because of the“ p » n ” problem. We observed that GroupAdaBoost gave a robust performance for cases of the excess number p of genes. In several real datasets which are publicly available from web-pages, we compared the analysis of results among the proposed method and others, and a small scale of simulation study to confirm the validity of the proposed method. Additionally the proposed method effectively worked for the identification of important genes.