Classification of a Massive Number of Viral Genomes and Estimation of Time of Most Recent Common Ancestor (tMRCA) of SARS-CoV-2 Using Phylodynamic Analysis.
{"title":"Classification of a Massive Number of Viral Genomes and Estimation of Time of Most Recent Common Ancestor (tMRCA) of SARS-CoV-2 Using Phylodynamic Analysis.","authors":"Xiaowen Hu, Siqin Guan, Yiliang He, Guohui Yi, Lei Yao, Jiaming Zhang","doi":"10.21769/BioProtoc.4955","DOIUrl":null,"url":null,"abstract":"<p><p>Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes. Key features • Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA. • Classification of genomes based on highly linked sites using custom scripts. • Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST). • Visualization of posterior distribution of tMRCA using Tracer.v1.7.2. • Optimized for the SARS-CoV-2.</p>","PeriodicalId":93907,"journal":{"name":"Bio-protocol","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10958167/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bio-protocol","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21769/BioProtoc.4955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes. Key features • Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA. • Classification of genomes based on highly linked sites using custom scripts. • Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST). • Visualization of posterior distribution of tMRCA using Tracer.v1.7.2. • Optimized for the SARS-CoV-2.