Protein secondary structure prediction with high accuracy using Support Vector Machine

M. Shoyaib, S. M. Baker, T. Jabid, F. Anwar, Haseena Khan
{"title":"Protein secondary structure prediction with high accuracy using Support Vector Machine","authors":"M. Shoyaib, S. M. Baker, T. Jabid, F. Anwar, Haseena Khan","doi":"10.1109/ICCITECHN.2007.4579365","DOIUrl":null,"url":null,"abstract":"Mining bioinformatics data is an emerging area of research. Proteomics is one of the largest areas of focus in bioinformatics and data mining research. Protein structure prediction is one of the most crucial and decisive problem in all the areas of research. Protein secondary structure can be used for the determination of the tertiary structure via the fold recognition method. Hence, predicting the secondary structures from the proteinpsilas primary sequences has attracted the attention of many researchers. Experimental methods have proved to be complex and expensive. So to develop a simple and accurate method for structure prediction is of great importance. In this paper, a new method has been proposed based on the machine learning technique. The first step of this proposal is to find out frequent patterns of consecutive amino acids in a protein database. After this, a set of frequent words (feature set) is found. Then support vector machine (SVM) is used as a binary/tertiary classifier for the classification of protein secondary structure with these frequent words.","PeriodicalId":338170,"journal":{"name":"2007 10th international conference on computer and information technology","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 10th international conference on computer and information technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2007.4579365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Mining bioinformatics data is an emerging area of research. Proteomics is one of the largest areas of focus in bioinformatics and data mining research. Protein structure prediction is one of the most crucial and decisive problem in all the areas of research. Protein secondary structure can be used for the determination of the tertiary structure via the fold recognition method. Hence, predicting the secondary structures from the proteinpsilas primary sequences has attracted the attention of many researchers. Experimental methods have proved to be complex and expensive. So to develop a simple and accurate method for structure prediction is of great importance. In this paper, a new method has been proposed based on the machine learning technique. The first step of this proposal is to find out frequent patterns of consecutive amino acids in a protein database. After this, a set of frequent words (feature set) is found. Then support vector machine (SVM) is used as a binary/tertiary classifier for the classification of protein secondary structure with these frequent words.
基于支持向量机的高精度蛋白质二级结构预测
挖掘生物信息学数据是一个新兴的研究领域。蛋白质组学是生物信息学和数据挖掘研究中最大的焦点领域之一。蛋白质结构预测是所有研究领域中最关键、最具决定性的问题之一。蛋白质的二级结构可用于通过折叠识别方法确定三级结构。因此,利用蛋白质的初级序列预测蛋白质的二级结构已引起许多研究者的关注。实验方法已被证明是复杂和昂贵的。因此,开发一种简单、准确的结构预测方法具有重要的意义。本文提出了一种基于机器学习技术的新方法。该方案的第一步是找出蛋白质数据库中连续氨基酸的频繁模式。在此之后,找到一组频繁词(特征集)。然后利用支持向量机(SVM)作为二/三级分类器,利用这些频繁词对蛋白质二级结构进行分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信