{"title":"Malphite: A convolutional neural network and ensemble learning based protein secondary structure predictor","authors":"Y. Li, T. Shibuya","doi":"10.1109/BIBM.2015.7359861","DOIUrl":null,"url":null,"abstract":"We developed a convolution neural networks (CNN) and ensemble learning based method, called Malphite, to predict protein secondary structures. Maphite has three sub-models: the 1st CNN, PSI-PRED and the 2nd CNN. The 1st CNN and PSI-PRED are used to predict the initial secondary structure based on the position specific scoring matrix generated from PSIBLAST. The 2nd CNN performs ensemble learning by combining the prediction result of the 1st CNN and PSI-PRED and generate the final predictions. Malphite achieved a Q3 score of 82.3% and 82.6% for independently built dataset of 400 and 538 proteins respectively, and 82.6% ten-fold-cross validated accuracy for a dataset of 3000 proteins. In addition, Malphite accomplished a remarkable Q3 score of 83.6% for 122 targets from CASP10 (Critical Assessment of protein Structure Prediction), surpassing any secondary structure prediction technique to date. For all four datasets, Malphite consistently makes 2% more accurate prediction than PSI-PRED, which is a significantly step towards the estimated upper limit of protein secondary structure prediction accuracy of 90%.","PeriodicalId":186217,"journal":{"name":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2015.7359861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
We developed a convolution neural networks (CNN) and ensemble learning based method, called Malphite, to predict protein secondary structures. Maphite has three sub-models: the 1st CNN, PSI-PRED and the 2nd CNN. The 1st CNN and PSI-PRED are used to predict the initial secondary structure based on the position specific scoring matrix generated from PSIBLAST. The 2nd CNN performs ensemble learning by combining the prediction result of the 1st CNN and PSI-PRED and generate the final predictions. Malphite achieved a Q3 score of 82.3% and 82.6% for independently built dataset of 400 and 538 proteins respectively, and 82.6% ten-fold-cross validated accuracy for a dataset of 3000 proteins. In addition, Malphite accomplished a remarkable Q3 score of 83.6% for 122 targets from CASP10 (Critical Assessment of protein Structure Prediction), surpassing any secondary structure prediction technique to date. For all four datasets, Malphite consistently makes 2% more accurate prediction than PSI-PRED, which is a significantly step towards the estimated upper limit of protein secondary structure prediction accuracy of 90%.
我们开发了一种基于卷积神经网络(CNN)和集成学习的方法,称为Malphite,用于预测蛋白质二级结构。mapite有三个子模型:第一个CNN、PSI-PRED和第二个CNN。基于PSIBLAST生成的位置特定评分矩阵,使用第一个CNN和PSI-PRED来预测初始二级结构。第2个CNN将第1个CNN的预测结果与PSI-PRED相结合,进行集成学习,生成最终的预测。Malphite在独立构建的400个和538个蛋白质数据集上的Q3得分分别为82.3%和82.6%,在3000个蛋白质数据集上的十倍交叉验证准确率为82.6%。此外,Malphite在CASP10蛋白结构预测关键评估(Critical Assessment of protein Structure Prediction)中对122个靶点的Q3得分达到了83.6%,超过了迄今为止任何二级结构预测技术。在所有4个数据集中,Malphite的预测准确率始终比PSI-PRED高2%,这是向蛋白质二级结构预测准确率90%的估计上限迈出的重要一步。