Impact of the Predicted Protein Structural Content on Prediction of Structural Classes for the Twilight Zone Proteins

2006 5th International Conference on Machine Learning and Applications (ICMLA'06) Pub Date : 2006-12-01 DOI:10.1109/ICMLA.2006.27

Lukasz Kurgan, M. Rahbari, L. Homaeian

{"title":"Impact of the Predicted Protein Structural Content on Prediction of Structural Classes for the Twilight Zone Proteins","authors":"Lukasz Kurgan, M. Rahbari, L. Homaeian","doi":"10.1109/ICMLA.2006.27","DOIUrl":null,"url":null,"abstract":"This paper addresses in silico prediction of protein structural classes as defined in the SCOP database. The SCOP defines total of 11 classes, while majority of proteins are classified to the 4 classes: all-alpha all-beta alpha/beta, and alpha+beta. The main goals of this paper are to experimentally evaluate the impact of predicted protein secondary structure content on the structural class prediction and to develop a novel protein sequence representation. The experiments include application of three protein sequence representations and four classifiers to prediction of both 4 and 11 structural classes. The predictions are performed using a large dataset of low homology (twilight zone) sequences. The proposed sequence representation includes the predicted structural content, which provides the strongest contribution towards classification, composition and composition moment vectors, hydrophobic autocorrelations, chemical group composition and molecular weight of the protein. The predicted content values are shown on average to improve the prediction accuracy by 3.3% and 4.2% for the 4 and 11 classes, respectively, when compared to sequence representation that does not utilize this information. Finally, we propose a very compact, 20 dimensional sequence representation that is shown to improve the prediction accuracy by 5.1-8.5% when compared with recently published results","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2006.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

This paper addresses in silico prediction of protein structural classes as defined in the SCOP database. The SCOP defines total of 11 classes, while majority of proteins are classified to the 4 classes: all-alpha all-beta alpha/beta, and alpha+beta. The main goals of this paper are to experimentally evaluate the impact of predicted protein secondary structure content on the structural class prediction and to develop a novel protein sequence representation. The experiments include application of three protein sequence representations and four classifiers to prediction of both 4 and 11 structural classes. The predictions are performed using a large dataset of low homology (twilight zone) sequences. The proposed sequence representation includes the predicted structural content, which provides the strongest contribution towards classification, composition and composition moment vectors, hydrophobic autocorrelations, chemical group composition and molecular weight of the protein. The predicted content values are shown on average to improve the prediction accuracy by 3.3% and 4.2% for the 4 and 11 classes, respectively, when compared to sequence representation that does not utilize this information. Finally, we propose a very compact, 20 dimensional sequence representation that is shown to improve the prediction accuracy by 5.1-8.5% when compared with recently published results

查看原文本刊更多论文

预测蛋白质结构含量对模糊区蛋白质结构分类预测的影响

本文讨论了在SCOP数据库中定义的蛋白质结构类的计算机预测。SCOP总共定义了11类蛋白质，而大多数蛋白质被分类为4类:all- α - β α / β和α + β。本文的主要目的是通过实验评估预测的蛋白质二级结构含量对结构类预测的影响，并建立一种新的蛋白质序列表示方法。实验包括应用3种蛋白质序列表示和4种分类器对4和11种结构类进行预测。预测是使用低同源性(模糊区)序列的大型数据集进行的。所提出的序列表示包括预测的结构含量，这对蛋白质的分类、组成和组成矩向量、疏水自相关性、化学基团组成和分子量提供了最大的贡献。与不利用该信息的序列表示相比，平均显示的预测内容值可将4类和11类的预测精度分别提高3.3%和4.2%。最后，我们提出了一个非常紧凑的20维序列表示，与最近发表的结果相比，预测精度提高了5.1-8.5%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2006 5th International Conference on Machine Learning and Applications (ICMLA'06)

自引率

0.00%

发文量