Balachandran Manavalan, K. Kuwajima, InSuk Joung, Jooyoung Lee
{"title":"Structure-based protein folding type classification and folding rate prediction","authors":"Balachandran Manavalan, K. Kuwajima, InSuk Joung, Jooyoung Lee","doi":"10.1109/BIBM.2015.7359953","DOIUrl":null,"url":null,"abstract":"Protein folding rate is one of the important properties of a protein. Protein folding rate prediction is useful for understanding protein folding process and guiding protein design. In this study, we developed a support vector machine (SVM) based method to predict protein folding kinetic types (two-state or non-two-state) and the real-value folding rate using the features calculated from the three-dimensional structure such as contact order, various properties from the non-local contact clusters, secondary structural information and sequence length. We systematically studied the contributions of individual features to folding rate prediction. Based on the highest contributions of individual features, we trained our machine using leave one out cross-validation and tested on a testing dataset. The Pearson correlation coefficient, mean absolute difference and root mean square error between the predicted and experimental folding rates (base-10 logarithmic scale) are 0.814, 0.752 and 0.910 for two-state proteins, and 0.860, 0.687 and 0.876 for non-two-state proteins. Moreover, our method predicts whether a protein of known atomic structure folds according to two-state or non-two-state kinetics and correctly classifies 80% of the folding mechanism on a testing dataset. Finally, we evaluated the performance of our method along with the other eight existing protein folding rate prediction tools on non-overlapping benchmarking dataset. The prediction performance will also be reported and discussed.","PeriodicalId":186217,"journal":{"name":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2015.7359953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Protein folding rate is one of the important properties of a protein. Protein folding rate prediction is useful for understanding protein folding process and guiding protein design. In this study, we developed a support vector machine (SVM) based method to predict protein folding kinetic types (two-state or non-two-state) and the real-value folding rate using the features calculated from the three-dimensional structure such as contact order, various properties from the non-local contact clusters, secondary structural information and sequence length. We systematically studied the contributions of individual features to folding rate prediction. Based on the highest contributions of individual features, we trained our machine using leave one out cross-validation and tested on a testing dataset. The Pearson correlation coefficient, mean absolute difference and root mean square error between the predicted and experimental folding rates (base-10 logarithmic scale) are 0.814, 0.752 and 0.910 for two-state proteins, and 0.860, 0.687 and 0.876 for non-two-state proteins. Moreover, our method predicts whether a protein of known atomic structure folds according to two-state or non-two-state kinetics and correctly classifies 80% of the folding mechanism on a testing dataset. Finally, we evaluated the performance of our method along with the other eight existing protein folding rate prediction tools on non-overlapping benchmarking dataset. The prediction performance will also be reported and discussed.
蛋白质折叠率是蛋白质的重要性质之一。蛋白质折叠速率预测对理解蛋白质折叠过程和指导蛋白质设计具有重要意义。在这项研究中,我们开发了一种基于支持向量机(SVM)的方法来预测蛋白质折叠动力学类型(两态或非两态)和实值折叠率,该方法利用三维结构计算的特征,如接触顺序、非局部接触簇的各种性质、二级结构信息和序列长度。我们系统地研究了个体特征对折叠率预测的贡献。基于单个特征的最高贡献,我们使用leave one out交叉验证来训练我们的机器,并在测试数据集上进行测试。两态蛋白与实验折叠率的Pearson相关系数、平均绝对差和均方根误差(base-10对数标度)分别为0.814、0.752和0.910,非两态蛋白的Pearson相关系数、平均绝对差和均方根误差分别为0.860、0.687和0.876。此外,我们的方法预测了已知原子结构的蛋白质是否根据两态或非两态动力学折叠,并在测试数据集中正确分类了80%的折叠机制。最后,我们与其他八种现有的蛋白质折叠率预测工具在非重叠基准数据集上评估了我们的方法的性能。预测效果也将被报告和讨论。