Dating ancient splits in phylogenetic trees, with application to the human-Neanderthal split.

IF 1.9 Q3 GENETICS & HEREDITY
Keren Levinstein Hallak, Saharon Rosset
{"title":"Dating ancient splits in phylogenetic trees, with application to the human-Neanderthal split.","authors":"Keren Levinstein Hallak, Saharon Rosset","doi":"10.1186/s12863-023-01185-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>We tackle the problem of estimating species TMRCAs (Time to Most Recent Common Ancestor), given a genome sequence from each species and a large known phylogenetic tree with a known structure (typically from one of the species). The number of transitions at each site from the first sequence to the other is assumed to be Poisson distributed, and only the parity of the number of transitions is observed. The detailed phylogenetic tree contains information about the transition rates in each site. We use this formulation to develop and analyze multiple estimators of the species' TMRCA. To test our methods, we use mtDNA substitution statistics from the well-established Phylotree as a baseline for data simulation such that the substitution rate per site mimics the real-world observed rates.</p><p><strong>Results: </strong>We evaluate our methods using simulated data and compare them to the Bayesian optimizing software BEAST2, showing that our proposed estimators are accurate for a wide range of TMRCAs and significantly outperform BEAST2. We then apply the proposed estimators on Neanderthal, Denisovan, and Chimpanzee mtDNA genomes to better estimate their TMRCA with modern humans and find that their TMRCA is substantially later, compared to values cited recently in the literature.</p><p><strong>Conclusions: </strong>Our methods utilize the transition statistics from the entire known human mtDNA phylogenetic tree (Phylotree), eliminating the requirement to reconstruct a tree encompassing the specific sequences of interest. Moreover, they demonstrate notable improvement in both running speed and accuracy compared to BEAST2, particularly for earlier TMRCAs like the human-Chimpanzee split. Our results date the human - Neanderthal TMRCA to be [Formula: see text] years ago, considerably later than values cited in other recent studies.</p>","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"25 1","pages":"4"},"PeriodicalIF":1.9000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759710/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC genomic data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12863-023-01185-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: We tackle the problem of estimating species TMRCAs (Time to Most Recent Common Ancestor), given a genome sequence from each species and a large known phylogenetic tree with a known structure (typically from one of the species). The number of transitions at each site from the first sequence to the other is assumed to be Poisson distributed, and only the parity of the number of transitions is observed. The detailed phylogenetic tree contains information about the transition rates in each site. We use this formulation to develop and analyze multiple estimators of the species' TMRCA. To test our methods, we use mtDNA substitution statistics from the well-established Phylotree as a baseline for data simulation such that the substitution rate per site mimics the real-world observed rates.

Results: We evaluate our methods using simulated data and compare them to the Bayesian optimizing software BEAST2, showing that our proposed estimators are accurate for a wide range of TMRCAs and significantly outperform BEAST2. We then apply the proposed estimators on Neanderthal, Denisovan, and Chimpanzee mtDNA genomes to better estimate their TMRCA with modern humans and find that their TMRCA is substantially later, compared to values cited recently in the literature.

Conclusions: Our methods utilize the transition statistics from the entire known human mtDNA phylogenetic tree (Phylotree), eliminating the requirement to reconstruct a tree encompassing the specific sequences of interest. Moreover, they demonstrate notable improvement in both running speed and accuracy compared to BEAST2, particularly for earlier TMRCAs like the human-Chimpanzee split. Our results date the human - Neanderthal TMRCA to be [Formula: see text] years ago, considerably later than values cited in other recent studies.

对系统发生树中的古老分裂进行定年,并将其应用于人类-尼安德特人的分裂。
背景:我们要解决的问题是,在给定每个物种的基因组序列和具有已知结构的大型已知系统发生树(通常来自其中一个物种)的情况下,估算物种的 TMRCAs(到最近共同祖先的时间)。假设每个位点从第一个序列到另一个序列的转换次数为泊松分布,只观察转换次数的奇偶性。详细的系统发生树包含每个位点的转换率信息。我们使用这种方法来开发和分析物种 TMRCA 的多个估计值。为了测试我们的方法,我们使用成熟的系统树(Phylotree)中的 mtDNA 替换统计作为数据模拟的基线,使每个位点的替换率模拟现实世界中的观察率:我们使用模拟数据对我们的方法进行了评估,并将其与贝叶斯优化软件 BEAST2 进行了比较,结果表明,我们提出的估计方法对各种 TMRCAs 都很准确,而且明显优于 BEAST2。然后,我们在尼安德特人、丹尼索瓦人和黑猩猩的 mtDNA 基因组上应用了我们提出的估计方法,以更好地估计它们与现代人的 TMRCA,结果发现,与最近文献中引用的数值相比,它们的 TMRCA 要晚得多:我们的方法利用了整个已知人类 mtDNA 系统发生树(Phylotree)的过渡统计量,无需重建包含特定相关序列的系统发生树。此外,与 BEAST2 相比,它们在运行速度和准确性方面都有显著提高,特别是在早期的 TMRCAs(如人类-黑猩猩分裂)方面。我们的研究结果将人类与尼安德特人的 TMRCA 年代定为[公式:见正文]年前,大大晚于其他最新研究中引用的数值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.90
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信