Hybrid analysis with phylogeny and population modeling to estimate the recent founding date of a population: A case study in the origins of COVID-19 illustrates how a branching process approximation can simplify a hybrid analysis

IF 1.9 4区 数学 Q2 BIOLOGY
John L. Spouge
{"title":"Hybrid analysis with phylogeny and population modeling to estimate the recent founding date of a population: A case study in the origins of COVID-19 illustrates how a branching process approximation can simplify a hybrid analysis","authors":"John L. Spouge","doi":"10.1016/j.mbs.2025.109401","DOIUrl":null,"url":null,"abstract":"<div><div>The exact date of the primary infection in COVID-19 remains unknown. One influential article (Pekar et al. (2021)) estimated the date with a hybrid analysis combining epidemiological and phylogenetic methods. The phylogenetic methods analyzed 583 SARS-COV-2 complete genomes to estimate the sample tMRCA (time of the most recent common ancestor). Before igniting as an epidemic, however, COVID-19 may have had several population bottlenecks with only a single infected person, so the MRCA merely represents the last such bottleneck. Pekar et al. (2021) therefore used epidemiological methods to estimate the time from the primary infection to the sample MRCA. The hybrid method involved several arbitrary decisions, however, reflecting the fact that the epidemiological and phylogenetic analyses overlap at the sample MRCA and are generally probabilistically dependent. Towards removing the dependence, note that the start of an epidemic has a branching process approximation. Let the branching process have a single ancestor. If the branching process does not go extinct, define skeleton particles (individuals) to be particles whose lineages do not go extinct, and define the long-time MRCA as the earliest skeleton particle with at least two skeleton offspring. A linear phylogeny of skeleton particles therefore separates the ancestor from the long-time MRCA. Probabilistically, the linear phylogeny is a defective renewal process of skeleton particles, making the generation count geometrically distributed. Moreover, the terminology “long-time MRCA” is apt, because as time becomes arbitrarily large, the MRCA of the corresponding extant population approaches the long-time MRCA. Effectively, the focus on the long-time MRCA makes the forward epidemiological and backward phylogenetic analyses probabilistically independent. The present article can therefore confirm most of the epidemiological conclusions of the hybrid analysis of Pekar et al. (2021). Its use of branching process approximations also points the way to noticeable simplifications in the hybrid method.</div></div>","PeriodicalId":51119,"journal":{"name":"Mathematical Biosciences","volume":"382 ","pages":"Article 109401"},"PeriodicalIF":1.9000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0025556425000276","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The exact date of the primary infection in COVID-19 remains unknown. One influential article (Pekar et al. (2021)) estimated the date with a hybrid analysis combining epidemiological and phylogenetic methods. The phylogenetic methods analyzed 583 SARS-COV-2 complete genomes to estimate the sample tMRCA (time of the most recent common ancestor). Before igniting as an epidemic, however, COVID-19 may have had several population bottlenecks with only a single infected person, so the MRCA merely represents the last such bottleneck. Pekar et al. (2021) therefore used epidemiological methods to estimate the time from the primary infection to the sample MRCA. The hybrid method involved several arbitrary decisions, however, reflecting the fact that the epidemiological and phylogenetic analyses overlap at the sample MRCA and are generally probabilistically dependent. Towards removing the dependence, note that the start of an epidemic has a branching process approximation. Let the branching process have a single ancestor. If the branching process does not go extinct, define skeleton particles (individuals) to be particles whose lineages do not go extinct, and define the long-time MRCA as the earliest skeleton particle with at least two skeleton offspring. A linear phylogeny of skeleton particles therefore separates the ancestor from the long-time MRCA. Probabilistically, the linear phylogeny is a defective renewal process of skeleton particles, making the generation count geometrically distributed. Moreover, the terminology “long-time MRCA” is apt, because as time becomes arbitrarily large, the MRCA of the corresponding extant population approaches the long-time MRCA. Effectively, the focus on the long-time MRCA makes the forward epidemiological and backward phylogenetic analyses probabilistically independent. The present article can therefore confirm most of the epidemiological conclusions of the hybrid analysis of Pekar et al. (2021). Its use of branching process approximations also points the way to noticeable simplifications in the hybrid method.
使用系统发育和种群模型进行混合分析,以估计种群的最近成立日期:以COVID-19起源为例进行的研究说明了分支过程近似如何简化混合分析。
COVID-19初次感染的确切日期尚不清楚。一篇有影响力的文章(Pekar等人(2021))通过结合流行病学和系统发育方法的混合分析估计了日期。系统发育方法分析了583个SARS-COV-2全基因组,以估计样本的tMRCA(最近共同祖先的时间)。然而,在成为流行病之前,COVID-19可能有几个人口瓶颈,只有一个感染者,因此MRCA只是代表了最后一个这样的瓶颈。因此,Pekar等人(2021)使用流行病学方法来估计从原发性感染到样本MRCA的时间。然而,混合方法涉及几个武断的决定,反映了流行病学和系统发育分析在样本MRCA重叠的事实,并且通常是概率依赖的。为了消除这种依赖性,请注意,流行病的开始具有分支过程近似值。让分支流程有一个单一的祖先。如果分支过程没有灭绝,则将骨架粒子(个体)定义为谱系没有灭绝的粒子,将长时间MRCA定义为最早的具有至少两个骨架后代的骨架粒子。因此,骨骼颗粒的线性系统发育将祖先与长期的MRCA分开。从概率上讲,线性系统发育是骨架粒子的缺陷更新过程,使得代数呈几何分布。此外,“长期MRCA”这个术语是恰当的,因为随着时间变得任意大,相应现存种群的MRCA接近于长期MRCA。有效地,对长期MRCA的关注使得前向流行病学分析和后向系统发育分析具有概率独立性。因此,本文可以证实Pekar等人(2021)混合分析的大多数流行病学结论。它对分支过程近似的使用也为混合方法的显著简化指明了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Mathematical Biosciences
Mathematical Biosciences 生物-生物学
CiteScore
7.50
自引率
2.30%
发文量
67
审稿时长
18 days
期刊介绍: Mathematical Biosciences publishes work providing new concepts or new understanding of biological systems using mathematical models, or methodological articles likely to find application to multiple biological systems. Papers are expected to present a major research finding of broad significance for the biological sciences, or mathematical biology. Mathematical Biosciences welcomes original research articles, letters, reviews and perspectives.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信