{"title":"Lifting scheme-based wavelet transform method for improved genomic classification and sequence analysis of Coronavirus","authors":"Subhajit Kar, Madhabi Ganguly, Supratik Sen","doi":"10.1142/s2737599423500020","DOIUrl":null,"url":null,"abstract":"The paper proposes a lifting scheme-based wavelet transform clustering method as a better alternative to traditional alignment-based virus genome classification and grouping techniques. The efficiency of the proposed alignment-free algorithm have been tested using Coronavirus datasets obtained from NCBI database, against established results from proven techniques. In the proposed approach, the nucleotide sequences are converted into numerical ones leveraging purine–pyrimidine mapping and a DNA walk is calculated to visually interpret them. Second-generation wavelet transform employing Cohen–Daubechies–Feauveau wavelet is applied to the numerical sequences of Coronavirus to determine the approximate coefficients. Approximate coefficients are used to cluster Coronavirus sequences using UPGMA phylogenetic tree for three different datasets of Coronaviruses comprising Coronavirus groups, Human Coronaviruses (HCoVs) and [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] Coronavirus genre. The proposed algorithm has successfully classified all the datasets with more than 97% of average accuracy compared in terms of complexity and accuracy against FFT, first-generation DWT, MEGA, and CLUSTAL-W. The obtained accuracy for Corona group is 100%, HCoV dataset is 100%, and for [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] CoV is 92%. The runtimes of the algorithm are 0.70, 1.22, and 0.63 sec for the respective Coronavirus datasets.","PeriodicalId":29682,"journal":{"name":"Innovation and Emerging Technologies","volume":"57 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Innovation and Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2737599423500020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The paper proposes a lifting scheme-based wavelet transform clustering method as a better alternative to traditional alignment-based virus genome classification and grouping techniques. The efficiency of the proposed alignment-free algorithm have been tested using Coronavirus datasets obtained from NCBI database, against established results from proven techniques. In the proposed approach, the nucleotide sequences are converted into numerical ones leveraging purine–pyrimidine mapping and a DNA walk is calculated to visually interpret them. Second-generation wavelet transform employing Cohen–Daubechies–Feauveau wavelet is applied to the numerical sequences of Coronavirus to determine the approximate coefficients. Approximate coefficients are used to cluster Coronavirus sequences using UPGMA phylogenetic tree for three different datasets of Coronaviruses comprising Coronavirus groups, Human Coronaviruses (HCoVs) and [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] Coronavirus genre. The proposed algorithm has successfully classified all the datasets with more than 97% of average accuracy compared in terms of complexity and accuracy against FFT, first-generation DWT, MEGA, and CLUSTAL-W. The obtained accuracy for Corona group is 100%, HCoV dataset is 100%, and for [Formula: see text]–[Formula: see text]–[Formula: see text]–[Formula: see text] CoV is 92%. The runtimes of the algorithm are 0.70, 1.22, and 0.63 sec for the respective Coronavirus datasets.