基于k-mer的新冠病毒变异检测方法

H. Arslan
{"title":"基于k-mer的新冠病毒变异检测方法","authors":"H. Arslan","doi":"10.24012/dumf.1195600","DOIUrl":null,"url":null,"abstract":"Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) belongs to coronaviridae family and a change in the genetic sequence of SARS-CoV-2 is named as a mutation that causes to variants of SARS-CoV-2. In this paper, we propose a novel and efficient method to predict SARS-CoV-2 variants of concern from whole human genome sequences. In this method, we describe 16 dinucleotide and 64 trinucleotide features to differentiate SARS-CoV-2 variants of concern. The efficacy of the proposed features is proved by using four classifiers, k-nearest neighbor, support vector machines, multilayer perceptron, and random forest. The proposed method is evaluated on the dataset including 223,326 complete human genome sequences including recently designated variants of concern, Alpha, Beta, Gamma, Delta, and Omicron variants. Experimental results present that overall accuracy for detecting SARS-CoV-2 variants of concern remarkably increases when trinucleotide features rather than dinucleotide features are used. Furthermore, we use the whale optimization algorithm, which is a state-of-the-art method for reducing the number of features and choosing the most relevant features. We select 44 trinucleotide features out of 64 to differentiate SARS-CoV-2 variants with acceptable accuracy as a result of the whale optimization method. Experimental results indicate that the SVM classifier with selected features achieves about 99% accuracy, sensitivity, specificity, precision on average. The proposed method presents an admirable performance for detecting SARS-CoV-2 variants.","PeriodicalId":158576,"journal":{"name":"DÜMF Mühendislik Dergisi","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A k-mer based metaheuristic approach for detecting COVID-19 variants\",\"authors\":\"H. Arslan\",\"doi\":\"10.24012/dumf.1195600\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) belongs to coronaviridae family and a change in the genetic sequence of SARS-CoV-2 is named as a mutation that causes to variants of SARS-CoV-2. In this paper, we propose a novel and efficient method to predict SARS-CoV-2 variants of concern from whole human genome sequences. In this method, we describe 16 dinucleotide and 64 trinucleotide features to differentiate SARS-CoV-2 variants of concern. The efficacy of the proposed features is proved by using four classifiers, k-nearest neighbor, support vector machines, multilayer perceptron, and random forest. The proposed method is evaluated on the dataset including 223,326 complete human genome sequences including recently designated variants of concern, Alpha, Beta, Gamma, Delta, and Omicron variants. Experimental results present that overall accuracy for detecting SARS-CoV-2 variants of concern remarkably increases when trinucleotide features rather than dinucleotide features are used. Furthermore, we use the whale optimization algorithm, which is a state-of-the-art method for reducing the number of features and choosing the most relevant features. We select 44 trinucleotide features out of 64 to differentiate SARS-CoV-2 variants with acceptable accuracy as a result of the whale optimization method. Experimental results indicate that the SVM classifier with selected features achieves about 99% accuracy, sensitivity, specificity, precision on average. The proposed method presents an admirable performance for detecting SARS-CoV-2 variants.\",\"PeriodicalId\":158576,\"journal\":{\"name\":\"DÜMF Mühendislik Dergisi\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"DÜMF Mühendislik Dergisi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24012/dumf.1195600\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"DÜMF Mühendislik Dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24012/dumf.1195600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

严重急性呼吸综合征冠状病毒2型(SARS-CoV-2)属于冠状病毒科,SARS-CoV-2基因序列的变化被命名为导致SARS-CoV-2变异的突变。在本文中,我们提出了一种新的、有效的方法,从整个人类基因组序列中预测关注的SARS-CoV-2变体。在这种方法中,我们描述了16个二核苷酸和64个三核苷酸特征来区分关注的SARS-CoV-2变体。通过使用k近邻、支持向量机、多层感知器和随机森林四种分类器证明了所提特征的有效性。该方法在包含223,326个完整人类基因组序列的数据集上进行了评估,包括最近指定的关注变体,Alpha, Beta, Gamma, Delta和Omicron变体。实验结果表明,当使用三核苷酸特征而不是二核苷酸特征时,检测相关SARS-CoV-2变体的总体准确性显着提高。此外,我们使用鲸鱼优化算法,这是一种最先进的方法,用于减少特征数量并选择最相关的特征。通过鲸鱼优化方法,我们从64个特征中选择了44个三核苷酸特征来区分SARS-CoV-2变体,准确度可接受。实验结果表明,选取特征后的SVM分类器平均准确率、灵敏度、特异度、精密度均达到99%左右。该方法在检测SARS-CoV-2变体方面表现出令人满意的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A k-mer based metaheuristic approach for detecting COVID-19 variants
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) belongs to coronaviridae family and a change in the genetic sequence of SARS-CoV-2 is named as a mutation that causes to variants of SARS-CoV-2. In this paper, we propose a novel and efficient method to predict SARS-CoV-2 variants of concern from whole human genome sequences. In this method, we describe 16 dinucleotide and 64 trinucleotide features to differentiate SARS-CoV-2 variants of concern. The efficacy of the proposed features is proved by using four classifiers, k-nearest neighbor, support vector machines, multilayer perceptron, and random forest. The proposed method is evaluated on the dataset including 223,326 complete human genome sequences including recently designated variants of concern, Alpha, Beta, Gamma, Delta, and Omicron variants. Experimental results present that overall accuracy for detecting SARS-CoV-2 variants of concern remarkably increases when trinucleotide features rather than dinucleotide features are used. Furthermore, we use the whale optimization algorithm, which is a state-of-the-art method for reducing the number of features and choosing the most relevant features. We select 44 trinucleotide features out of 64 to differentiate SARS-CoV-2 variants with acceptable accuracy as a result of the whale optimization method. Experimental results indicate that the SVM classifier with selected features achieves about 99% accuracy, sensitivity, specificity, precision on average. The proposed method presents an admirable performance for detecting SARS-CoV-2 variants.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信