Prediction of protein secondary structure using the 3D-1D compatibility algorithm.

M Ito, Y Matsuo, K Nishikawa
{"title":"Prediction of protein secondary structure using the 3D-1D compatibility algorithm.","authors":"M Ito,&nbsp;Y Matsuo,&nbsp;K Nishikawa","doi":"10.1093/bioinformatics/13.4.415","DOIUrl":null,"url":null,"abstract":"<p><p>A new method for the prediction of protein secondary structure is proposed, which relies totally on the global aspect of a protein. The prediction scheme is as follows. A structural library is first scanned with a query sequence by the 3D-1D compatibility method developed before. All the structures examined are sorted with the compatibility score and the top 50 in the list are picked out. Then, all the known secondary structures of the 50 proteins are globally aligned against the query sequence, according to the 3D-1D alignments. Prediction of either alpha helix, beta strand or coil is made by taking the majority among the observations at each residue site. Besides 325 proteins in the structural library, 77 proteins were selected from the latest release of the Brookhaven Protein Data Bank, and they were divided into three data sets. Data set 1 was used as a training set for which several adjustable parameters in the method were optimized. Then, the final form of the method was applied to a testing set (data set 2) which contained proteins of chain length < or = 400 residues. The average prediction accuracy was as high as 69% in the three-state assessment of alpha, beta and coil. On the other hand, data set 3 contains only those proteins of length > 400 residues, for which the present method would not work properly because of the size effect inherent in the 3D-1D compatibility method. The proteins in data set 3 were, therefore, subdivided into constituent domains (data set 4) before being fed into the prediction program. The prediction accuracy for data set 4 was 66% on average, a few percent lower than that for data set 2. Possible causes for this discrepancy are discussed.</p>","PeriodicalId":77081,"journal":{"name":"Computer applications in the biosciences : CABIOS","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1997-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1093/bioinformatics/13.4.415","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer applications in the biosciences : CABIOS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/13.4.415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 49

Abstract

A new method for the prediction of protein secondary structure is proposed, which relies totally on the global aspect of a protein. The prediction scheme is as follows. A structural library is first scanned with a query sequence by the 3D-1D compatibility method developed before. All the structures examined are sorted with the compatibility score and the top 50 in the list are picked out. Then, all the known secondary structures of the 50 proteins are globally aligned against the query sequence, according to the 3D-1D alignments. Prediction of either alpha helix, beta strand or coil is made by taking the majority among the observations at each residue site. Besides 325 proteins in the structural library, 77 proteins were selected from the latest release of the Brookhaven Protein Data Bank, and they were divided into three data sets. Data set 1 was used as a training set for which several adjustable parameters in the method were optimized. Then, the final form of the method was applied to a testing set (data set 2) which contained proteins of chain length < or = 400 residues. The average prediction accuracy was as high as 69% in the three-state assessment of alpha, beta and coil. On the other hand, data set 3 contains only those proteins of length > 400 residues, for which the present method would not work properly because of the size effect inherent in the 3D-1D compatibility method. The proteins in data set 3 were, therefore, subdivided into constituent domains (data set 4) before being fed into the prediction program. The prediction accuracy for data set 4 was 66% on average, a few percent lower than that for data set 2. Possible causes for this discrepancy are discussed.

利用3D-1D相容算法预测蛋白质二级结构。
提出了一种完全依赖于蛋白质整体结构的蛋白质二级结构预测新方法。预测方案如下:首先使用之前开发的3D-1D兼容方法对结构库进行查询序列扫描。所有被检查的结构都按照兼容性评分进行排序,并在列表中挑选出前50名。然后,根据3D-1D比对,将50个蛋白质的所有已知二级结构与查询序列进行全局比对。对α螺旋、β链或螺旋的预测是通过在每个残基位点的观察中取大多数来完成的。除了结构库中的325个蛋白质外,还从最新发布的Brookhaven Protein Data Bank中选择了77个蛋白质,并将其分为三个数据集。以数据集1作为训练集,对方法中的几个可调参数进行优化。然后,将该方法的最终形式应用于包含链长<或= 400个残基的蛋白质的测试集(数据集2)。在alpha、beta和coil三种状态评估中,平均预测准确率高达69%。另一方面,数据集3仅包含长度> 400个残基的蛋白质,由于3D-1D相容性方法固有的尺寸效应,目前的方法将无法正常工作。因此,在输入到预测程序之前,数据集3中的蛋白质被细分为组成域(数据集4)。数据集4的预测精度平均为66%,比数据集2的预测精度低几个百分点。讨论了造成这种差异的可能原因。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信