Rezwanuzzaman Laskar, Md Gulam Jilani, Taslima Nasrin, Safdar Ali
{"title":"Microsatellite Signature of Reference Genome Sequence of SARS-CoV-2 and 32 Species of Coronaviridae Family","authors":"Rezwanuzzaman Laskar, Md Gulam Jilani, Taslima Nasrin, Safdar Ali","doi":"10.5812/iji-122019","DOIUrl":null,"url":null,"abstract":"Background: Simple sequence repeats (SSRs) are 1 - 6 bp repeat motif sequences present across both prokaryotic and eukaryotic genomes with various clinical implications besides being tools for conservation and evolutionary studies. Objectives: Analysis of 33 Coronavirus genomes, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), for incidence, distribution, and complexity of SSRs patterns to understand their role in host divergence and evolution. Methods: Full-length genome sequences were extracted from National Center for Biotechnology Information (NCBI). Extraction of microsatellites was done using imperfect microsatellite extractor (IMEx) in “Advanced Mode”. Sequences were aligned with MAFFT v6.861b and the maximum likelihood tree was inferred using RAxML v8.1.20 of the GTR + GAMMA+I model with default specifications. Results: A total of 3,442 SSRs and 136 complex sequence repeats (cSSRs) were extracted from the studied 33 genomes. SSR incidence ranged from 82 (CV09) to 144 (CV60). cSSR incidence ranged from 1 (CV42, CV43, CV53) to 11 (CV32). CV61 (SARS-CoV-2) had 107 SSRs and 6 SSRs. Di-nucleotide motifs were the most prevalent followed by tri- and mono-nucleotide motifs. TG/GT was the most represented di-nucleotide motif, followed by CA/AC. In tri-nucleotide SSRs, ACA/TGT was the most represented motif followed by CAA/GTT, whereas in mono-nucleotide SSRs, T was the most observed nucleotide, followed by A. About 94% of SSRs were localized to the coding region. Twenty species, including CV61 (SARS-CoV-2), exhibit mono-nucleotide repeats exclusively in the A/T region, which were clustered in phylogenetic analysis. The sequence similarity of the genomes was assessed through heat map analysis and revealed similar sequences are expectedly placed in proximity on the phylogenetic tree. Conclusions: Mono-nucleotide exclusivity to A/T region and SSR genome signature can be a possible basis for predicting the evolution of viruses in terms of host range.","PeriodicalId":13989,"journal":{"name":"International Journal of Infection","volume":"66 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Infection","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5812/iji-122019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Background: Simple sequence repeats (SSRs) are 1 - 6 bp repeat motif sequences present across both prokaryotic and eukaryotic genomes with various clinical implications besides being tools for conservation and evolutionary studies. Objectives: Analysis of 33 Coronavirus genomes, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), for incidence, distribution, and complexity of SSRs patterns to understand their role in host divergence and evolution. Methods: Full-length genome sequences were extracted from National Center for Biotechnology Information (NCBI). Extraction of microsatellites was done using imperfect microsatellite extractor (IMEx) in “Advanced Mode”. Sequences were aligned with MAFFT v6.861b and the maximum likelihood tree was inferred using RAxML v8.1.20 of the GTR + GAMMA+I model with default specifications. Results: A total of 3,442 SSRs and 136 complex sequence repeats (cSSRs) were extracted from the studied 33 genomes. SSR incidence ranged from 82 (CV09) to 144 (CV60). cSSR incidence ranged from 1 (CV42, CV43, CV53) to 11 (CV32). CV61 (SARS-CoV-2) had 107 SSRs and 6 SSRs. Di-nucleotide motifs were the most prevalent followed by tri- and mono-nucleotide motifs. TG/GT was the most represented di-nucleotide motif, followed by CA/AC. In tri-nucleotide SSRs, ACA/TGT was the most represented motif followed by CAA/GTT, whereas in mono-nucleotide SSRs, T was the most observed nucleotide, followed by A. About 94% of SSRs were localized to the coding region. Twenty species, including CV61 (SARS-CoV-2), exhibit mono-nucleotide repeats exclusively in the A/T region, which were clustered in phylogenetic analysis. The sequence similarity of the genomes was assessed through heat map analysis and revealed similar sequences are expectedly placed in proximity on the phylogenetic tree. Conclusions: Mono-nucleotide exclusivity to A/T region and SSR genome signature can be a possible basis for predicting the evolution of viruses in terms of host range.