Automated protein classification using consensus decision.

Proceedings. IEEE Computational Systems Bioinformatics Conference Pub Date : 2004-01-01 DOI:10.1109/csb.2004.1332436

Tolga Can, Orhan Camoğlu, Ambuj K Singh, Yuan-Fang Wang

{"title":"Automated protein classification using consensus decision.","authors":"Tolga Can, Orhan Camoğlu, Ambuj K Singh, Yuan-Fang Wang","doi":"10.1109/csb.2004.1332436","DOIUrl":null,"url":null,"abstract":"<p><p>We propose a novel technique for automatically generating the SCOP classification of a protein structure with high accuracy. High accuracy is achieved by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique is rooted in machine learning which shows that by judicially employing component classifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure, using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, and fold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3-12 times less than the individual classifiers' error rates at the family level, 1.5-4.5 times less at the superfamily level, and 1.1-2.4 times less at the fold level.</p>","PeriodicalId":87417,"journal":{"name":"Proceedings. IEEE Computational Systems Bioinformatics Conference","volume":" ","pages":"224-35"},"PeriodicalIF":0.0000,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/csb.2004.1332436","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computational Systems Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csb.2004.1332436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We propose a novel technique for automatically generating the SCOP classification of a protein structure with high accuracy. High accuracy is achieved by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique is rooted in machine learning which shows that by judicially employing component classifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure, using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, and fold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3-12 times less than the individual classifiers' error rates at the family level, 1.5-4.5 times less at the superfamily level, and 1.1-2.4 times less at the fold level.

查看原文本刊更多论文

使用共识决策的自动蛋白质分类。

提出了一种自动生成高精度蛋白质结构SCOP分类的新技术。通过使用委员会(或集成)分类器的共识组合多种方法的决策来实现高精度。我们的技术植根于机器学习，这表明通过合理地使用组件分类器，可以构建一个集成分类器，以优于其组件。我们使用两个序列比较工具和三个结构比较工具作为组件分类器。给定一个蛋白质结构，使用联合假设，我们首先确定该蛋白质是否属于SCOP层次结构中的现有类别(家族、超家族、折叠)。对于预测为现有类别成员的蛋白质，我们使用共识分类器计算其家族，超家族和折叠级别分类。我们表明，与单个组件分类器相比，我们可以显着提高分类精度。特别是，我们实现的错误率比单个分类器在家族层面的错误率低3-12倍，在超家族层面的错误率低1.5-4.5倍，在折叠层面的错误率低1.1-2.4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE Computational Systems Bioinformatics Conference

自引率

0.00%

发文量