Classification of cohesin family using class specific motifs

Ercument M. Eser, B. Arslan, U. Sezerman
{"title":"Classification of cohesin family using class specific motifs","authors":"Ercument M. Eser, B. Arslan, U. Sezerman","doi":"10.1109/HIBIT.2013.6661687","DOIUrl":null,"url":null,"abstract":"Motif extraction from protein sequences has been a challenging task for bioinformaticians. Class-specific motifs, which are frequently found in one class but are in small ratio in other classes can be used for highly accurate classification of protein sequences. In this study, we present a new scoring based method for class-specific n-gram motif selection using reduced amino acid alphabets. Cohesin protein sequences, which interact with Dockerin modules to construct the most common and abundant organic polymer Cellulosome is used for class specific motif selection, and selected motifs are then given to J48 and SVM algorithms as features. Results of classification are examined with parameters of various n-gram sizes, reduced amino acid alphabets and feature number. Result with training accuracy of 98.61 % and test accuracy of 94.54 %, was found to be best one using Gbmr14 alphabet, 5 features per family, 4-gram motifs and J48 algorithm. The proposed technique can be generalized to use for other protein families.","PeriodicalId":433206,"journal":{"name":"2013 8th International Symposium on Health Informatics and Bioinformatics","volume":"259 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th International Symposium on Health Informatics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIBIT.2013.6661687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Motif extraction from protein sequences has been a challenging task for bioinformaticians. Class-specific motifs, which are frequently found in one class but are in small ratio in other classes can be used for highly accurate classification of protein sequences. In this study, we present a new scoring based method for class-specific n-gram motif selection using reduced amino acid alphabets. Cohesin protein sequences, which interact with Dockerin modules to construct the most common and abundant organic polymer Cellulosome is used for class specific motif selection, and selected motifs are then given to J48 and SVM algorithms as features. Results of classification are examined with parameters of various n-gram sizes, reduced amino acid alphabets and feature number. Result with training accuracy of 98.61 % and test accuracy of 94.54 %, was found to be best one using Gbmr14 alphabet, 5 features per family, 4-gram motifs and J48 algorithm. The proposed technique can be generalized to use for other protein families.
用类特定基序对黏结蛋白家族进行分类
从蛋白质序列中提取基序一直是生物信息学家面临的一项具有挑战性的任务。类特异性基序通常存在于一类中,但在其他类中所占比例较小,可用于蛋白质序列的高度精确分类。在这项研究中,我们提出了一种新的基于评分的方法,使用减少的氨基酸字母来选择特定类别的n-gram基序。内聚蛋白序列与Dockerin模块相互作用,构建最常见和最丰富的有机聚合物纤维素,用于类特异性基序选择,然后将选择的基序作为特征给予J48和SVM算法。分类结果用各种n-gram大小、减少的氨基酸字母和特征数的参数进行检验。结果表明,采用Gbmr14字母表、每族5个特征、4克图案和J48算法,训练正确率为98.61%,测试正确率为94.54%。该技术可推广应用于其他蛋白质家族。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信