Using Homology Information From PDB to Improve The Accuracy of Protein β-turn Prediction by NetTurnP*: Using Homology Information From PDB to Improve The Accuracy of Protein β-turn Prediction by NetTurnP*
{"title":"Using Homology Information From PDB to Improve The Accuracy of Protein β-turn Prediction by NetTurnP*: Using Homology Information From PDB to Improve The Accuracy of Protein β-turn Prediction by NetTurnP*","authors":"Gang Qian, Haiyan Wang, Zheng Yuan","doi":"10.3724/SP.J.1206.2011.00370","DOIUrl":null,"url":null,"abstract":"茁-Turn is a secondary protein structure type that is important in protein folding, protein stability and molecular recognition processes. To date, various methods have been put forward to predict 茁-turns, but none of them have tried directly to map the structures of pre-existing homologues from structural databases like RCSB PDB to the protein to be predicted. Given the large size of PDB (>70 000 structures), it is actually of high possibility to find a structural homologue for a newly identified sequence. In this work, we present a new method that predicts 茁-turns by combining homology information extracted from PDB with the results predicted by NetTurnP. Two datasets, the golden set BT426 and the self-constructed dataset EVA937, are used to assess our method. For each sequence in both datasets, only homologues deposited earlier than the sequence in PDB are employed. We have achieved Matthews correlation coefficients (MCCs) of 0.56, 0.52 respectively, which are higher than those obtained by NetTurnP alone of 0.50, 0.46, and the prediction accuracies (Qtotal) obtained using our method are 81.4% and 80.4% separately, while NetTurnP alone achieves 78.2% and 77.3% . The results confirm that combining the homology information with state-of-the-art 茁-turn predictors like NetTurnP can significantly improve the prediction accuracy. A Java program called BTMapping has been written to implement our method, which is freely available at http://www.bio530.weebly.com together with the related datasets.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3724/SP.J.1206.2011.00370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
茁-Turn is a secondary protein structure type that is important in protein folding, protein stability and molecular recognition processes. To date, various methods have been put forward to predict 茁-turns, but none of them have tried directly to map the structures of pre-existing homologues from structural databases like RCSB PDB to the protein to be predicted. Given the large size of PDB (>70 000 structures), it is actually of high possibility to find a structural homologue for a newly identified sequence. In this work, we present a new method that predicts 茁-turns by combining homology information extracted from PDB with the results predicted by NetTurnP. Two datasets, the golden set BT426 and the self-constructed dataset EVA937, are used to assess our method. For each sequence in both datasets, only homologues deposited earlier than the sequence in PDB are employed. We have achieved Matthews correlation coefficients (MCCs) of 0.56, 0.52 respectively, which are higher than those obtained by NetTurnP alone of 0.50, 0.46, and the prediction accuracies (Qtotal) obtained using our method are 81.4% and 80.4% separately, while NetTurnP alone achieves 78.2% and 77.3% . The results confirm that combining the homology information with state-of-the-art 茁-turn predictors like NetTurnP can significantly improve the prediction accuracy. A Java program called BTMapping has been written to implement our method, which is freely available at http://www.bio530.weebly.com together with the related datasets.
p -Turn是一种二级蛋白质结构类型,在蛋白质折叠、蛋白质稳定性和分子识别过程中起重要作用。迄今为止,已经提出了各种方法来预测基因的转位,但没有一种方法试图直接将结构数据库(如RCSB PDB)中已有的同源物的结构映射到要预测的蛋白质上。考虑到PDB的大尺寸(> 70000个结构),实际上为新鉴定的序列找到结构同源物的可能性很高。在这项工作中,我们提出了一种新的方法,将从PDB中提取的同源信息与NetTurnP预测的结果相结合,来预测p -turn。使用黄金集BT426和自构建数据集EVA937两个数据集来评估我们的方法。对于两个数据集中的每个序列,只使用早于PDB序列的同源物。我们获得的马修斯相关系数(mcs)分别为0.56、0.52,高于单独使用NetTurnP获得的0.50、0.46,预测准确率(Qtotal)分别为81.4%和80.4%,而单独使用NetTurnP获得的准确率分别为78.2%和77.3%。结果证实,将同源性信息与最先进的预测器(如NetTurnP)相结合可以显着提高预测精度。已经编写了一个名为BTMapping的Java程序来实现我们的方法,该程序和相关数据集可以在http://www.bio530.weebly.com上免费获得。