Prediction of protein secondary structure with a reliability score estimated by local sequence clustering.

Protein engineering Pub Date : 2003-09-01 DOI:10.1093/protein/gzg089

Fan Jiang

{"title":"Prediction of protein secondary structure with a reliability score estimated by local sequence clustering.","authors":"Fan Jiang","doi":"10.1093/protein/gzg089","DOIUrl":null,"url":null,"abstract":"<p><p>Most algorithms for protein secondary structure prediction are based on machine learning techniques, e.g. neural networks. Good architectures and learning methods have improved the performance continuously. The introduction of profile methods, e.g. PSI-BLAST, has been a major breakthrough in increasing the prediction accuracy to close to 80%. In this paper, a brute-force algorithm is proposed and the reliability of each prediction is estimated by a z-score based on local sequence clustering. This algorithm is intended to perform well for those secondary structures in a protein whose formation is mainly dominated by the neighboring sequences and short-range interactions. A reliability z-score has been defined to estimate the goodness of a putative cluster found for a query sequence in a database. The database for prediction was constructed by experimentally determined, non-redundant protein structures with <25% sequence homology, a list maintained by PDBSELECT. Our test results have shown that this new algorithm, belonging to what is known as nearest neighbor methods, performed very well within the expectation of previous methods and that the reliability z-score as defined was correlated with the reliability of prediction. This led to the possibility of making very accurate predictions for a few selected residues in a protein with an accuracy measure of Q3 > 80%. The further development of this algorithm, and a nucleation mechanism for protein folding are suggested.</p>","PeriodicalId":20902,"journal":{"name":"Protein engineering","volume":"16 9","pages":"651-7"},"PeriodicalIF":0.0000,"publicationDate":"2003-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/protein/gzg089","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Protein engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/protein/gzg089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Most algorithms for protein secondary structure prediction are based on machine learning techniques, e.g. neural networks. Good architectures and learning methods have improved the performance continuously. The introduction of profile methods, e.g. PSI-BLAST, has been a major breakthrough in increasing the prediction accuracy to close to 80%. In this paper, a brute-force algorithm is proposed and the reliability of each prediction is estimated by a z-score based on local sequence clustering. This algorithm is intended to perform well for those secondary structures in a protein whose formation is mainly dominated by the neighboring sequences and short-range interactions. A reliability z-score has been defined to estimate the goodness of a putative cluster found for a query sequence in a database. The database for prediction was constructed by experimentally determined, non-redundant protein structures with <25% sequence homology, a list maintained by PDBSELECT. Our test results have shown that this new algorithm, belonging to what is known as nearest neighbor methods, performed very well within the expectation of previous methods and that the reliability z-score as defined was correlated with the reliability of prediction. This led to the possibility of making very accurate predictions for a few selected residues in a protein with an accuracy measure of Q3 > 80%. The further development of this algorithm, and a nucleation mechanism for protein folding are suggested.

查看原文本刊更多论文

用局部序列聚类估计可靠性评分的蛋白质二级结构预测。

大多数蛋白质二级结构预测算法都是基于机器学习技术，如神经网络。良好的体系结构和学习方法不断提高了性能。PSI-BLAST等剖面方法的引入是将预测精度提高到接近80%的重大突破。本文提出了一种基于局部序列聚类的暴力破解算法，并通过z分数估计每个预测的可靠性。该算法旨在对蛋白质中那些主要由邻近序列和短程相互作用支配的二级结构表现良好。已经定义了可靠性z分数来估计数据库中查询序列中发现的假定群集的良好性。用于预测的数据库是由实验确定的非冗余蛋白结构构建而成。提出了该算法的进一步发展和蛋白质折叠成核机制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Protein engineering

自引率

0.00%

发文量