Effective framework for protein structure prediction

Int. J. Funct. Informatics Pers. Medicine Pub Date : 2012-11-21 DOI:10.1504/IJFIPM.2012.050426

Nagamma Patil, Durga Toshniwal, K. Garg

引用次数: 2

Abstract

This paper presents a computational system to predict protein structure using N–grams and a wrapper feature selection framework (the N–gram is a subsequence composed of N characters, extracted from a larger sequence). N–gram features are extracted from a dataset consisting of 277 domains: 70 all–α domains, 61 all–β domains, 81 α/β domains and 65 α + β domains. A wrapper feature selection system, GA–SVM, is applied to obtain an optimised feature set. Using the optimised 3070–feature subset, a classifier model is trained and tested in the Support Vector Machine (SVM) learning system. This model achieves an overall accuracy of 88.09%, evaluated by a 10–fold cross–validation test. This value is 4.7% higher than the one using the initial 6,414 features. Experimental results also illustrate that employing a feature subset selection, by using the proposed GA–SVM wrapper approach, has enhanced classification accuracy in comparison to other GA–based wrapper approaches and existing protein sequence encoding methods.

查看原文本刊更多论文

蛋白质结构预测的有效框架

本文提出了一个使用N - gram和包装特征选择框架(N - gram是由N个字符组成的子序列，从一个较大的序列中提取)来预测蛋白质结构的计算系统。从277个结构域中提取N-gram特征:70个全α结构域，61个全β结构域，81个α/β结构域和65个α + β结构域。应用一种包装特征选择系统GA-SVM来获得一个优化的特征集。利用优化后的3070个特征子集，在支持向量机(SVM)学习系统中训练和测试分类器模型。通过10倍交叉验证检验，该模型的总体准确率为88.09%。这个值比使用最初的6,414个特征的值高4.7%。实验结果还表明，与其他基于遗传算法的包装方法和现有的蛋白质序列编码方法相比，使用GA-SVM包装方法进行特征子集选择，可以提高分类精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Funct. Informatics Pers. Medicine

自引率

0.00%

发文量