Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification

IF 0.5 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
A. Pekuwali, W. Kusuma, A. Buono
{"title":"Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification","authors":"A. Pekuwali, W. Kusuma, A. Buono","doi":"10.5614/ITBJ.ICT.RES.APPL.2018.12.2.2","DOIUrl":null,"url":null,"abstract":"K -mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k -mers. These were obtained by generating the possible combinations of match positions and don’t care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k -mers could reduce the size of the k -mer frequency feature’s dimension. To measure the accuracy of the proposed method we used the naive Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k -mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k -mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.","PeriodicalId":42785,"journal":{"name":"Journal of ICT Research and Applications","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2018-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ICT Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5614/ITBJ.ICT.RES.APPL.2018.12.2.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 3

Abstract

K -mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k -mers. These were obtained by generating the possible combinations of match positions and don’t care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k -mers could reduce the size of the k -mer frequency feature’s dimension. To measure the accuracy of the proposed method we used the naive Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k -mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k -mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.
基于遗传算法的元基因组片段分类空间K-mer频率特征提取优化
K-mer频率通常用于从宏基因组片段中提取特征。尽管如此,研究人员发现它们的使用仍然效率低下。在这项研究中,采用遗传算法来寻找最佳间隔的k-mers。这些是通过生成匹配位置和不在乎位置的可能组合(写为*)获得的。这种方法是从PatternHunter中间隔种子的概念中采用的。使用间隔的k-mer可以减小k-mer频率特征的尺寸。为了测量所提出方法的准确性,我们使用了朴素贝叶斯分类器(NBC)。结果表明,代表间隔k-mer模型[111 1111 10001]的染色体1111 1111 0001是最好的染色体,其适应度(85.42)高于k-mer频率特征。此外,该方法还减少了特征提取时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of ICT Research and Applications
Journal of ICT Research and Applications COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
1.60
自引率
0.00%
发文量
13
审稿时长
24 weeks
期刊介绍: Journal of ICT Research and Applications welcomes full research articles in the area of Information and Communication Technology from the following subject areas: Information Theory, Signal Processing, Electronics, Computer Network, Telecommunication, Wireless & Mobile Computing, Internet Technology, Multimedia, Software Engineering, Computer Science, Information System and Knowledge Management. Authors are invited to submit articles that have not been published previously and are not under consideration elsewhere.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信