基于提升方案的小波变换改进冠状病毒基因组分类与序列分析

IF 2.4 Q2 ENGINEERING, MULTIDISCIPLINARY
Subhajit Kar, Madhabi Ganguly, Supratik Sen
{"title":"基于提升方案的小波变换改进冠状病毒基因组分类与序列分析","authors":"Subhajit Kar, Madhabi Ganguly, Supratik Sen","doi":"10.1142/s2737599423500020","DOIUrl":null,"url":null,"abstract":"The paper proposes a lifting scheme-based wavelet transform clustering method as a better alternative to traditional alignment-based virus genome classification and grouping techniques. The efficiency of the proposed alignment-free algorithm have been tested using Coronavirus datasets obtained from NCBI database, against established results from proven techniques. In the proposed approach, the nucleotide sequences are converted into numerical ones leveraging purine–pyrimidine mapping and a DNA walk is calculated to visually interpret them. Second-generation wavelet transform employing Cohen–Daubechies–Feauveau wavelet is applied to the numerical sequences of Coronavirus to determine the approximate coefficients. Approximate coefficients are used to cluster Coronavirus sequences using UPGMA phylogenetic tree for three different datasets of Coronaviruses comprising Coronavirus groups, Human Coronaviruses (HCoVs) and [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] Coronavirus genre. The proposed algorithm has successfully classified all the datasets with more than 97% of average accuracy compared in terms of complexity and accuracy against FFT, first-generation DWT, MEGA, and CLUSTAL-W. The obtained accuracy for Corona group is 100%, HCoV dataset is 100%, and for [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] CoV is 92%. The runtimes of the algorithm are 0.70, 1.22, and 0.63 sec for the respective Coronavirus datasets.","PeriodicalId":29682,"journal":{"name":"Innovation and Emerging Technologies","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lifting scheme-based wavelet transform method for improved genomic classification and sequence analysis of Coronavirus\",\"authors\":\"Subhajit Kar, Madhabi Ganguly, Supratik Sen\",\"doi\":\"10.1142/s2737599423500020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper proposes a lifting scheme-based wavelet transform clustering method as a better alternative to traditional alignment-based virus genome classification and grouping techniques. The efficiency of the proposed alignment-free algorithm have been tested using Coronavirus datasets obtained from NCBI database, against established results from proven techniques. In the proposed approach, the nucleotide sequences are converted into numerical ones leveraging purine–pyrimidine mapping and a DNA walk is calculated to visually interpret them. Second-generation wavelet transform employing Cohen–Daubechies–Feauveau wavelet is applied to the numerical sequences of Coronavirus to determine the approximate coefficients. Approximate coefficients are used to cluster Coronavirus sequences using UPGMA phylogenetic tree for three different datasets of Coronaviruses comprising Coronavirus groups, Human Coronaviruses (HCoVs) and [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] Coronavirus genre. The proposed algorithm has successfully classified all the datasets with more than 97% of average accuracy compared in terms of complexity and accuracy against FFT, first-generation DWT, MEGA, and CLUSTAL-W. The obtained accuracy for Corona group is 100%, HCoV dataset is 100%, and for [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] CoV is 92%. The runtimes of the algorithm are 0.70, 1.22, and 0.63 sec for the respective Coronavirus datasets.\",\"PeriodicalId\":29682,\"journal\":{\"name\":\"Innovation and Emerging Technologies\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Innovation and Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s2737599423500020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Innovation and Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2737599423500020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种基于提升方案的小波变换聚类方法,作为传统基于比对的病毒基因组分类和分组技术的更好替代方法。使用从NCBI数据库获得的冠状病毒数据集,对所提出的无比对算法的效率进行了测试,并与经过验证的技术的既定结果进行了对比。在提出的方法中,利用嘌呤-嘧啶映射将核苷酸序列转换为数字序列,并计算DNA漫步以直观地解释它们。采用Cohen-Daubechies-Feauveau小波对冠状病毒数值序列进行第二代小波变换,确定近似系数。利用UPGMA系统发育树对冠状病毒的三个不同数据集(包括冠状病毒群、人类冠状病毒(HCoVs)和[公式:见文本]-[公式:见文本]-[公式:见文本]-[公式:见文本]冠状病毒类型)的冠状病毒序列进行近似系数聚类。与FFT、第一代DWT、MEGA和CLUSTAL-W相比,该算法在复杂度和准确率方面成功地对所有数据集进行了分类,平均准确率超过97%。Corona组的准确率为100%,HCoV数据集的准确率为100%,[公式:见文]-[公式:见文]-[公式:见文]-[公式:见文]-[公式:见文]的CoV准确率为92%。对于冠状病毒数据集,该算法的运行时间分别为0.70、1.22和0.63秒。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Lifting scheme-based wavelet transform method for improved genomic classification and sequence analysis of Coronavirus
The paper proposes a lifting scheme-based wavelet transform clustering method as a better alternative to traditional alignment-based virus genome classification and grouping techniques. The efficiency of the proposed alignment-free algorithm have been tested using Coronavirus datasets obtained from NCBI database, against established results from proven techniques. In the proposed approach, the nucleotide sequences are converted into numerical ones leveraging purine–pyrimidine mapping and a DNA walk is calculated to visually interpret them. Second-generation wavelet transform employing Cohen–Daubechies–Feauveau wavelet is applied to the numerical sequences of Coronavirus to determine the approximate coefficients. Approximate coefficients are used to cluster Coronavirus sequences using UPGMA phylogenetic tree for three different datasets of Coronaviruses comprising Coronavirus groups, Human Coronaviruses (HCoVs) and [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] Coronavirus genre. The proposed algorithm has successfully classified all the datasets with more than 97% of average accuracy compared in terms of complexity and accuracy against FFT, first-generation DWT, MEGA, and CLUSTAL-W. The obtained accuracy for Corona group is 100%, HCoV dataset is 100%, and for [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] CoV is 92%. The runtimes of the algorithm are 0.70, 1.22, and 0.63 sec for the respective Coronavirus datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信