基于提升方案的小波变换改进冠状病毒基因组分类与序列分析

IF 0.8 Q2 ENGINEERING, MULTIDISCIPLINARY

Innovation and Emerging Technologies Pub Date : 2023-01-01 DOI:10.1142/s2737599423500020

Subhajit Kar, Madhabi Ganguly, Supratik Sen

{"title":"基于提升方案的小波变换改进冠状病毒基因组分类与序列分析","authors":"Subhajit Kar, Madhabi Ganguly, Supratik Sen","doi":"10.1142/s2737599423500020","DOIUrl":null,"url":null,"abstract":"The paper proposes a lifting scheme-based wavelet transform clustering method as a better alternative to traditional alignment-based virus genome classification and grouping techniques. The efficiency of the proposed alignment-free algorithm have been tested using Coronavirus datasets obtained from NCBI database, against established results from proven techniques. In the proposed approach, the nucleotide sequences are converted into numerical ones leveraging purine–pyrimidine mapping and a DNA walk is calculated to visually interpret them. Second-generation wavelet transform employing Cohen–Daubechies–Feauveau wavelet is applied to the numerical sequences of Coronavirus to determine the approximate coefficients. Approximate coefficients are used to cluster Coronavirus sequences using UPGMA phylogenetic tree for three different datasets of Coronaviruses comprising Coronavirus groups, Human Coronaviruses (HCoVs) and [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] Coronavirus genre. The proposed algorithm has successfully classified all the datasets with more than 97% of average accuracy compared in terms of complexity and accuracy against FFT, first-generation DWT, MEGA, and CLUSTAL-W. The obtained accuracy for Corona group is 100%, HCoV dataset is 100%, and for [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] CoV is 92%. The runtimes of the algorithm are 0.70, 1.22, and 0.63 sec for the respective Coronavirus datasets.","PeriodicalId":29682,"journal":{"name":"Innovation and Emerging Technologies","volume":"57 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lifting scheme-based wavelet transform method for improved genomic classification and sequence analysis of Coronavirus\",\"authors\":\"Subhajit Kar, Madhabi Ganguly, Supratik Sen\",\"doi\":\"10.1142/s2737599423500020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper proposes a lifting scheme-based wavelet transform clustering method as a better alternative to traditional alignment-based virus genome classification and grouping techniques. The efficiency of the proposed alignment-free algorithm have been tested using Coronavirus datasets obtained from NCBI database, against established results from proven techniques. In the proposed approach, the nucleotide sequences are converted into numerical ones leveraging purine–pyrimidine mapping and a DNA walk is calculated to visually interpret them. Second-generation wavelet transform employing Cohen–Daubechies–Feauveau wavelet is applied to the numerical sequences of Coronavirus to determine the approximate coefficients. Approximate coefficients are used to cluster Coronavirus sequences using UPGMA phylogenetic tree for three different datasets of Coronaviruses comprising Coronavirus groups, Human Coronaviruses (HCoVs) and [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] Coronavirus genre. The proposed algorithm has successfully classified all the datasets with more than 97% of average accuracy compared in terms of complexity and accuracy against FFT, first-generation DWT, MEGA, and CLUSTAL-W. The obtained accuracy for Corona group is 100%, HCoV dataset is 100%, and for [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] CoV is 92%. The runtimes of the algorithm are 0.70, 1.22, and 0.63 sec for the respective Coronavirus datasets.\",\"PeriodicalId\":29682,\"journal\":{\"name\":\"Innovation and Emerging Technologies\",\"volume\":\"57 1\",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Innovation and Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s2737599423500020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Innovation and Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2737599423500020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种基于提升方案的小波变换聚类方法，作为传统基于比对的病毒基因组分类和分组技术的更好替代方法。使用从NCBI数据库获得的冠状病毒数据集，对所提出的无比对算法的效率进行了测试，并与经过验证的技术的既定结果进行了对比。在提出的方法中，利用嘌呤-嘧啶映射将核苷酸序列转换为数字序列，并计算DNA漫步以直观地解释它们。采用Cohen-Daubechies-Feauveau小波对冠状病毒数值序列进行第二代小波变换，确定近似系数。利用UPGMA系统发育树对冠状病毒的三个不同数据集(包括冠状病毒群、人类冠状病毒(HCoVs)和[公式:见文本]-[公式:见文本]-[公式:见文本]-[公式:见文本]冠状病毒类型)的冠状病毒序列进行近似系数聚类。与FFT、第一代DWT、MEGA和CLUSTAL-W相比，该算法在复杂度和准确率方面成功地对所有数据集进行了分类，平均准确率超过97%。Corona组的准确率为100%，HCoV数据集的准确率为100%，[公式:见文]-[公式:见文]-[公式:见文]-[公式:见文]-[公式:见文]的CoV准确率为92%。对于冠状病毒数据集，该算法的运行时间分别为0.70、1.22和0.63秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Lifting scheme-based wavelet transform method for improved genomic classification and sequence analysis of Coronavirus

The paper proposes a lifting scheme-based wavelet transform clustering method as a better alternative to traditional alignment-based virus genome classification and grouping techniques. The efficiency of the proposed alignment-free algorithm have been tested using Coronavirus datasets obtained from NCBI database, against established results from proven techniques. In the proposed approach, the nucleotide sequences are converted into numerical ones leveraging purine–pyrimidine mapping and a DNA walk is calculated to visually interpret them. Second-generation wavelet transform employing Cohen–Daubechies–Feauveau wavelet is applied to the numerical sequences of Coronavirus to determine the approximate coefficients. Approximate coefficients are used to cluster Coronavirus sequences using UPGMA phylogenetic tree for three different datasets of Coronaviruses comprising Coronavirus groups, Human Coronaviruses (HCoVs) and [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] Coronavirus genre. The proposed algorithm has successfully classified all the datasets with more than 97% of average accuracy compared in terms of complexity and accuracy against FFT, first-generation DWT, MEGA, and CLUSTAL-W. The obtained accuracy for Corona group is 100%, HCoV dataset is 100%, and for [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] CoV is 92%. The runtimes of the algorithm are 0.70, 1.22, and 0.63 sec for the respective Coronavirus datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Innovation and Emerging Technologies

自引率

0.00%

发文量