Hybrid analysis with phylogeny and population modeling to estimate the recent founding date of a population: A case study in the origins of COVID-19 illustrates how a branching process approximation can simplify a hybrid analysis
{"title":"Hybrid analysis with phylogeny and population modeling to estimate the recent founding date of a population: A case study in the origins of COVID-19 illustrates how a branching process approximation can simplify a hybrid analysis","authors":"John L. Spouge","doi":"10.1016/j.mbs.2025.109401","DOIUrl":null,"url":null,"abstract":"<div><div>The exact date of the primary infection in COVID-19 remains unknown. One influential article (Pekar et al. (2021)) estimated the date with a hybrid analysis combining epidemiological and phylogenetic methods. The phylogenetic methods analyzed 583 SARS-COV-2 complete genomes to estimate the sample tMRCA (time of the most recent common ancestor). Before igniting as an epidemic, however, COVID-19 may have had several population bottlenecks with only a single infected person, so the MRCA merely represents the last such bottleneck. Pekar et al. (2021) therefore used epidemiological methods to estimate the time from the primary infection to the sample MRCA. The hybrid method involved several arbitrary decisions, however, reflecting the fact that the epidemiological and phylogenetic analyses overlap at the sample MRCA and are generally probabilistically dependent. Towards removing the dependence, note that the start of an epidemic has a branching process approximation. Let the branching process have a single ancestor. If the branching process does not go extinct, define skeleton particles (individuals) to be particles whose lineages do not go extinct, and define the long-time MRCA as the earliest skeleton particle with at least two skeleton offspring. A linear phylogeny of skeleton particles therefore separates the ancestor from the long-time MRCA. Probabilistically, the linear phylogeny is a defective renewal process of skeleton particles, making the generation count geometrically distributed. Moreover, the terminology “long-time MRCA” is apt, because as time becomes arbitrarily large, the MRCA of the corresponding extant population approaches the long-time MRCA. Effectively, the focus on the long-time MRCA makes the forward epidemiological and backward phylogenetic analyses probabilistically independent. The present article can therefore confirm most of the epidemiological conclusions of the hybrid analysis of Pekar et al. (2021). Its use of branching process approximations also points the way to noticeable simplifications in the hybrid method.</div></div>","PeriodicalId":51119,"journal":{"name":"Mathematical Biosciences","volume":"382 ","pages":"Article 109401"},"PeriodicalIF":1.9000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0025556425000276","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The exact date of the primary infection in COVID-19 remains unknown. One influential article (Pekar et al. (2021)) estimated the date with a hybrid analysis combining epidemiological and phylogenetic methods. The phylogenetic methods analyzed 583 SARS-COV-2 complete genomes to estimate the sample tMRCA (time of the most recent common ancestor). Before igniting as an epidemic, however, COVID-19 may have had several population bottlenecks with only a single infected person, so the MRCA merely represents the last such bottleneck. Pekar et al. (2021) therefore used epidemiological methods to estimate the time from the primary infection to the sample MRCA. The hybrid method involved several arbitrary decisions, however, reflecting the fact that the epidemiological and phylogenetic analyses overlap at the sample MRCA and are generally probabilistically dependent. Towards removing the dependence, note that the start of an epidemic has a branching process approximation. Let the branching process have a single ancestor. If the branching process does not go extinct, define skeleton particles (individuals) to be particles whose lineages do not go extinct, and define the long-time MRCA as the earliest skeleton particle with at least two skeleton offspring. A linear phylogeny of skeleton particles therefore separates the ancestor from the long-time MRCA. Probabilistically, the linear phylogeny is a defective renewal process of skeleton particles, making the generation count geometrically distributed. Moreover, the terminology “long-time MRCA” is apt, because as time becomes arbitrarily large, the MRCA of the corresponding extant population approaches the long-time MRCA. Effectively, the focus on the long-time MRCA makes the forward epidemiological and backward phylogenetic analyses probabilistically independent. The present article can therefore confirm most of the epidemiological conclusions of the hybrid analysis of Pekar et al. (2021). Its use of branching process approximations also points the way to noticeable simplifications in the hybrid method.
期刊介绍:
Mathematical Biosciences publishes work providing new concepts or new understanding of biological systems using mathematical models, or methodological articles likely to find application to multiple biological systems. Papers are expected to present a major research finding of broad significance for the biological sciences, or mathematical biology. Mathematical Biosciences welcomes original research articles, letters, reviews and perspectives.