Folding Large Proteins by Ultra-Deep Learning

Jinbo Xu
{"title":"Folding Large Proteins by Ultra-Deep Learning","authors":"Jinbo Xu","doi":"10.1145/3107411.3107456","DOIUrl":null,"url":null,"abstract":"Ab initio protein folding is one of the most challenging problems in computational biology. The popular fragment assembly method mainly can only fold some small proteins. Recently contact-assisted folding has made some progress, but it requires accurate contact prediction, which by existing methods can only be achieved on some proteins with a very large number (>500 or 1000) of sequence homologs. To deal with proteins without so many sequence homologs, we have developed a novel deep learning model for contact prediction by concatenating two deep residual neural networks (ResNet), which performed the best in 2015 computer vision challenges. The first ResNet conducts convolutional transformation of 1-dimensional features and the second conducts convolutional transformation of 2-dimensional information including output of the first one. Experimental results suggest that our deep learning method greatly outperforms existing contact prediction methods and doubles the accuracy of pure co-evolution methods on proteins without many sequence homologs. Our method is ranked 1st in terms of the total F1 score in the latest CASP competition (i.e., CASP12), although back then (May-July 2016) our method was not fully implemented. Our predicted contacts also lead to much more accurate contact-assisted folding. Blindly tested in the weekly benchmark CAMEO (which can be interpreted as fully-automated CASP) since October 2016, our fully-automated web server implementing this method successfully folded many large hard targets (up to 600 residues) without good templates and many sequence homologs. Our large-scale benchmark indicates that ab initio folding (based upon predicted contacts) now can correctly fold more than 2/3 of randomly-chosen proteins. We have also applied this method to membrane protein contact prediction, which produces very good results in terms of both contact prediction accuracy and folding. An important finding is that even trained by only non-membrane proteins, our deep model works very well on membrane protein contact prediction and folding. This is because our deep model learns to predict contacts by making use of contact occurrence patterns (which are shared between membrane and non-membrane proteins) instead of sequence similarity. This method can also be extended to protein-protein interaction prediction, protein complex prediction and protein docking. Our web server implementing this method is publicly available at http://raptorx.uchicago.edu/ContactMap/ . For technical and result details, please see our papers [1-2].","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3107411.3107456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Ab initio protein folding is one of the most challenging problems in computational biology. The popular fragment assembly method mainly can only fold some small proteins. Recently contact-assisted folding has made some progress, but it requires accurate contact prediction, which by existing methods can only be achieved on some proteins with a very large number (>500 or 1000) of sequence homologs. To deal with proteins without so many sequence homologs, we have developed a novel deep learning model for contact prediction by concatenating two deep residual neural networks (ResNet), which performed the best in 2015 computer vision challenges. The first ResNet conducts convolutional transformation of 1-dimensional features and the second conducts convolutional transformation of 2-dimensional information including output of the first one. Experimental results suggest that our deep learning method greatly outperforms existing contact prediction methods and doubles the accuracy of pure co-evolution methods on proteins without many sequence homologs. Our method is ranked 1st in terms of the total F1 score in the latest CASP competition (i.e., CASP12), although back then (May-July 2016) our method was not fully implemented. Our predicted contacts also lead to much more accurate contact-assisted folding. Blindly tested in the weekly benchmark CAMEO (which can be interpreted as fully-automated CASP) since October 2016, our fully-automated web server implementing this method successfully folded many large hard targets (up to 600 residues) without good templates and many sequence homologs. Our large-scale benchmark indicates that ab initio folding (based upon predicted contacts) now can correctly fold more than 2/3 of randomly-chosen proteins. We have also applied this method to membrane protein contact prediction, which produces very good results in terms of both contact prediction accuracy and folding. An important finding is that even trained by only non-membrane proteins, our deep model works very well on membrane protein contact prediction and folding. This is because our deep model learns to predict contacts by making use of contact occurrence patterns (which are shared between membrane and non-membrane proteins) instead of sequence similarity. This method can also be extended to protein-protein interaction prediction, protein complex prediction and protein docking. Our web server implementing this method is publicly available at http://raptorx.uchicago.edu/ContactMap/ . For technical and result details, please see our papers [1-2].
通过超深度学习折叠大蛋白质
从头算蛋白质折叠是计算生物学中最具挑战性的问题之一。目前流行的片段组装方法主要是折叠一些小的蛋白质。近年来,接触辅助折叠已经取得了一些进展,但它需要精确的接触预测,而现有的方法只能在一些具有非常大数量(>500或1000)序列同源物的蛋白质上实现。为了处理没有如此多同源序列的蛋白质,我们通过连接两个深度残差神经网络(ResNet)开发了一种新的深度学习模型,用于接触预测,该模型在2015年的计算机视觉挑战中表现最佳。第一个ResNet对一维特征进行卷积变换,第二个ResNet对二维信息进行卷积变换,包括第一个ResNet的输出。实验结果表明,我们的深度学习方法大大优于现有的接触预测方法,并且在没有许多序列同源的蛋白质上,将纯协同进化方法的精度提高了一倍。我们的方法在最新的CASP比赛(即CASP12)中F1总分排名第一,尽管当时(2016年5月- 7月)我们的方法并没有完全实施。我们预测的接触也会导致更精确的接触辅助折叠。自2016年10月以来,在每周基准CAMEO(可以解释为全自动CASP)中盲目测试,我们的全自动web服务器实现了这种方法,成功折叠了许多大型硬目标(多达600个残基),没有良好的模板和许多序列同源物。我们的大规模基准测试表明,从头开始折叠(基于预测的接触)现在可以正确折叠超过2/3的随机选择的蛋白质。我们还将该方法应用于膜蛋白的接触预测,在接触预测精度和折叠方面都取得了很好的结果。一个重要的发现是,即使仅由非膜蛋白训练,我们的深度模型在膜蛋白接触预测和折叠方面也能很好地工作。这是因为我们的深度模型通过使用接触发生模式(在膜蛋白和非膜蛋白之间共享)而不是序列相似性来学习预测接触。该方法还可以推广到蛋白质-蛋白质相互作用预测、蛋白质复合体预测和蛋白质对接。实现此方法的web服务器可在http://raptorx.uchicago.edu/ContactMap/上公开获得。有关技术和结果细节,请参见我们的论文[1-2]。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信