Malphite: A convolutional neural network and ensemble learning based protein secondary structure predictor

Y. Li, T. Shibuya
{"title":"Malphite: A convolutional neural network and ensemble learning based protein secondary structure predictor","authors":"Y. Li, T. Shibuya","doi":"10.1109/BIBM.2015.7359861","DOIUrl":null,"url":null,"abstract":"We developed a convolution neural networks (CNN) and ensemble learning based method, called Malphite, to predict protein secondary structures. Maphite has three sub-models: the 1st CNN, PSI-PRED and the 2nd CNN. The 1st CNN and PSI-PRED are used to predict the initial secondary structure based on the position specific scoring matrix generated from PSIBLAST. The 2nd CNN performs ensemble learning by combining the prediction result of the 1st CNN and PSI-PRED and generate the final predictions. Malphite achieved a Q3 score of 82.3% and 82.6% for independently built dataset of 400 and 538 proteins respectively, and 82.6% ten-fold-cross validated accuracy for a dataset of 3000 proteins. In addition, Malphite accomplished a remarkable Q3 score of 83.6% for 122 targets from CASP10 (Critical Assessment of protein Structure Prediction), surpassing any secondary structure prediction technique to date. For all four datasets, Malphite consistently makes 2% more accurate prediction than PSI-PRED, which is a significantly step towards the estimated upper limit of protein secondary structure prediction accuracy of 90%.","PeriodicalId":186217,"journal":{"name":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2015.7359861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

We developed a convolution neural networks (CNN) and ensemble learning based method, called Malphite, to predict protein secondary structures. Maphite has three sub-models: the 1st CNN, PSI-PRED and the 2nd CNN. The 1st CNN and PSI-PRED are used to predict the initial secondary structure based on the position specific scoring matrix generated from PSIBLAST. The 2nd CNN performs ensemble learning by combining the prediction result of the 1st CNN and PSI-PRED and generate the final predictions. Malphite achieved a Q3 score of 82.3% and 82.6% for independently built dataset of 400 and 538 proteins respectively, and 82.6% ten-fold-cross validated accuracy for a dataset of 3000 proteins. In addition, Malphite accomplished a remarkable Q3 score of 83.6% for 122 targets from CASP10 (Critical Assessment of protein Structure Prediction), surpassing any secondary structure prediction technique to date. For all four datasets, Malphite consistently makes 2% more accurate prediction than PSI-PRED, which is a significantly step towards the estimated upper limit of protein secondary structure prediction accuracy of 90%.
基于卷积神经网络和集成学习的蛋白质二级结构预测器
我们开发了一种基于卷积神经网络(CNN)和集成学习的方法,称为Malphite,用于预测蛋白质二级结构。mapite有三个子模型:第一个CNN、PSI-PRED和第二个CNN。基于PSIBLAST生成的位置特定评分矩阵,使用第一个CNN和PSI-PRED来预测初始二级结构。第2个CNN将第1个CNN的预测结果与PSI-PRED相结合,进行集成学习,生成最终的预测。Malphite在独立构建的400个和538个蛋白质数据集上的Q3得分分别为82.3%和82.6%,在3000个蛋白质数据集上的十倍交叉验证准确率为82.6%。此外,Malphite在CASP10蛋白结构预测关键评估(Critical Assessment of protein Structure Prediction)中对122个靶点的Q3得分达到了83.6%,超过了迄今为止任何二级结构预测技术。在所有4个数据集中,Malphite的预测准确率始终比PSI-PRED高2%,这是向蛋白质二级结构预测准确率90%的估计上限迈出的重要一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信