Effective framework for protein structure prediction

Nagamma Patil, Durga Toshniwal, K. Garg
{"title":"Effective framework for protein structure prediction","authors":"Nagamma Patil, Durga Toshniwal, K. Garg","doi":"10.1504/IJFIPM.2012.050426","DOIUrl":null,"url":null,"abstract":"This paper presents a computational system to predict protein structure using N–grams and a wrapper feature selection framework (the N–gram is a subsequence composed of N characters, extracted from a larger sequence). N–gram features are extracted from a dataset consisting of 277 domains: 70 all–α domains, 61 all–β domains, 81 α/β domains and 65 α + β domains. A wrapper feature selection system, GA–SVM, is applied to obtain an optimised feature set. Using the optimised 3070–feature subset, a classifier model is trained and tested in the Support Vector Machine (SVM) learning system. This model achieves an overall accuracy of 88.09%, evaluated by a 10–fold cross–validation test. This value is 4.7% higher than the one using the initial 6,414 features. Experimental results also illustrate that employing a feature subset selection, by using the proposed GA–SVM wrapper approach, has enhanced classification accuracy in comparison to other GA–based wrapper approaches and existing protein sequence encoding methods.","PeriodicalId":216126,"journal":{"name":"Int. J. Funct. Informatics Pers. Medicine","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Funct. Informatics Pers. Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJFIPM.2012.050426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper presents a computational system to predict protein structure using N–grams and a wrapper feature selection framework (the N–gram is a subsequence composed of N characters, extracted from a larger sequence). N–gram features are extracted from a dataset consisting of 277 domains: 70 all–α domains, 61 all–β domains, 81 α/β domains and 65 α + β domains. A wrapper feature selection system, GA–SVM, is applied to obtain an optimised feature set. Using the optimised 3070–feature subset, a classifier model is trained and tested in the Support Vector Machine (SVM) learning system. This model achieves an overall accuracy of 88.09%, evaluated by a 10–fold cross–validation test. This value is 4.7% higher than the one using the initial 6,414 features. Experimental results also illustrate that employing a feature subset selection, by using the proposed GA–SVM wrapper approach, has enhanced classification accuracy in comparison to other GA–based wrapper approaches and existing protein sequence encoding methods.
蛋白质结构预测的有效框架
本文提出了一个使用N - gram和包装特征选择框架(N - gram是由N个字符组成的子序列,从一个较大的序列中提取)来预测蛋白质结构的计算系统。从277个结构域中提取N-gram特征:70个全α结构域,61个全β结构域,81个α/β结构域和65个α + β结构域。应用一种包装特征选择系统GA-SVM来获得一个优化的特征集。利用优化后的3070个特征子集,在支持向量机(SVM)学习系统中训练和测试分类器模型。通过10倍交叉验证检验,该模型的总体准确率为88.09%。这个值比使用最初的6,414个特征的值高4.7%。实验结果还表明,与其他基于遗传算法的包装方法和现有的蛋白质序列编码方法相比,使用GA-SVM包装方法进行特征子集选择,可以提高分类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信