预测不同的M-best蛋白接触图

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Pub Date : 2015-11-09 DOI:10.1109/BIBM.2015.7359865

S. Sun, Jianzhu Ma, Sheng Wang, Jinbo Xu

{"title":"预测不同的M-best蛋白接触图","authors":"S. Sun, Jianzhu Ma, Sheng Wang, Jinbo Xu","doi":"10.1109/BIBM.2015.7359865","DOIUrl":null,"url":null,"abstract":"Protein contacts contain important information for protein structure and functional study, but contact prediction from sequence information remains very challenging. Recently evolutionary coupling (EC) analysis, which predicts contacts by detecting co-evolved residues (or columns) in a multiple sequence alignment (MSA), has made good progress due to better statistical assessment techniques and high-throughput sequencing. Existing EC analysis methods predict only a single contact map for a given protein, which may have low accuracy especially when the protein under prediction does not have a large number of sequence homologs. Analogous to ab initio folding that usually predicts a few possible 3D models for a given protein sequence, this paper presents a novel structure learning method that can predict a set of diverse contact maps for a given protein sequence, in which the best solution usually has much better accuracy than the first one. Our experimental tests show that for many test proteins, the best out of 5 solutions generated by our method has accuracy at least 0.1 better than the first one when the top L/5 or L/10 (L is the sequence length) predicted long-range contacts are evaluated, especially for protein families with a small number of sequence homologs. Our best solutions also have better quality than those generated by the two popular EC methods Evfold and PSICOV.","PeriodicalId":186217,"journal":{"name":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting diverse M-best protein contact maps\",\"authors\":\"S. Sun, Jianzhu Ma, Sheng Wang, Jinbo Xu\",\"doi\":\"10.1109/BIBM.2015.7359865\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein contacts contain important information for protein structure and functional study, but contact prediction from sequence information remains very challenging. Recently evolutionary coupling (EC) analysis, which predicts contacts by detecting co-evolved residues (or columns) in a multiple sequence alignment (MSA), has made good progress due to better statistical assessment techniques and high-throughput sequencing. Existing EC analysis methods predict only a single contact map for a given protein, which may have low accuracy especially when the protein under prediction does not have a large number of sequence homologs. Analogous to ab initio folding that usually predicts a few possible 3D models for a given protein sequence, this paper presents a novel structure learning method that can predict a set of diverse contact maps for a given protein sequence, in which the best solution usually has much better accuracy than the first one. Our experimental tests show that for many test proteins, the best out of 5 solutions generated by our method has accuracy at least 0.1 better than the first one when the top L/5 or L/10 (L is the sequence length) predicted long-range contacts are evaluated, especially for protein families with a small number of sequence homologs. Our best solutions also have better quality than those generated by the two popular EC methods Evfold and PSICOV.\",\"PeriodicalId\":186217,\"journal\":{\"name\":\"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2015.7359865\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2015.7359865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

蛋白质接触是蛋白质结构和功能研究的重要信息，但从序列信息中预测蛋白质接触仍然具有很大的挑战性。进化耦合分析(EC)是一种通过检测多序列比对(MSA)中的共同进化残基(或列)来预测接触的分析方法，近年来由于更好的统计评估技术和高通量测序而取得了很好的进展。现有的EC分析方法只能预测给定蛋白质的单一接触图谱，特别是当预测的蛋白质没有大量的序列同源物时，精度可能较低。与从头算折叠通常预测给定蛋白质序列的几种可能的3D模型类似，本文提出了一种新的结构学习方法，该方法可以预测给定蛋白质序列的一组不同的接触映射，其中最佳解通常比第一个解具有更高的精度。我们的实验测试表明，对于许多测试蛋白，当评估预测远程接触的最高L/5或L/10 (L为序列长度)时，我们的方法生成的5个解决方案中的最佳解决方案比第一个解决方案精度至少提高0.1，特别是对于具有少量序列同源的蛋白质家族。我们的最佳解决方案也比两种流行的EC方法Evfold和PSICOV产生的质量更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Predicting diverse M-best protein contact maps

Protein contacts contain important information for protein structure and functional study, but contact prediction from sequence information remains very challenging. Recently evolutionary coupling (EC) analysis, which predicts contacts by detecting co-evolved residues (or columns) in a multiple sequence alignment (MSA), has made good progress due to better statistical assessment techniques and high-throughput sequencing. Existing EC analysis methods predict only a single contact map for a given protein, which may have low accuracy especially when the protein under prediction does not have a large number of sequence homologs. Analogous to ab initio folding that usually predicts a few possible 3D models for a given protein sequence, this paper presents a novel structure learning method that can predict a set of diverse contact maps for a given protein sequence, in which the best solution usually has much better accuracy than the first one. Our experimental tests show that for many test proteins, the best out of 5 solutions generated by our method has accuracy at least 0.1 better than the first one when the top L/5 or L/10 (L is the sequence length) predicted long-range contacts are evaluated, especially for protein families with a small number of sequence homologs. Our best solutions also have better quality than those generated by the two popular EC methods Evfold and PSICOV.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

自引率

0.00%

发文量