尽管存在基因树估计误差,但分化时间估计的稳健性:萤火虫(鞘翅目:灯蛾科)案例研究

IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY
Sebastian Höhna, Sarah E Lower, Pablo Duchen, Ana Catalán
{"title":"尽管存在基因树估计误差,但分化时间估计的稳健性:萤火虫(鞘翅目:灯蛾科)案例研究","authors":"Sebastian Höhna, Sarah E Lower, Pablo Duchen, Ana Catalán","doi":"10.1093/sysbio/syae065","DOIUrl":null,"url":null,"abstract":"Genomic data has become ubiquitous in phylogenomic studies, including divergence time estimation, but provide new challenges. These challenges include, amongst others, biological gene tree discordance, methodological gene tree estimation error, and computational limitations on performing full Bayesian inference under complex models. In this study, we use a recently published firefly (Coleoptera: Lampyridae) anchored hybrid enrichment dataset (AHE; 436 loci for 88 Lampyridae species and 10 outgroup species) as a case study to explore gene tree estimation error and the robustness of divergence time estimation. First, we explored the amount of model violation using posterior predictive simulations because model violations are likely to bias phylogenetic inferences and produce gene tree estimation error. We specifically focused on missing data (either uniformly distributed or systematically) and the distribution of highly variable and conserved sites (either uniformly distributed or clustered). Our assessment of model adequacy showed that standard phylogenetic substitution models are not adequate for any of the 436 AHE loci. We tested if the model violations and alignment errors resulted indeed in gene tree estimation error by comparing the observed gene tree discordance to simulated gene tree discordance under the multispecies coalescent model. Thus, we show that the inferred gene tree discordance is not only due to biological mechanism but primarily due to inference errors. Lastly, we explored if divergence time estimation is robust despite the observed gene tree estimation error. We selected four subsets of the full AHE dataset, concatenated each subset and performed a Bayesian relaxed clock divergence estimation in RevBayes. The estimated divergence times overlapped for all nodes that are shared between the topologies. Thus, divergence time estimation is robust using any well selected data subset as long as the topology inference is robust.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"20 1","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robustness of Divergence Time Estimation Despite Gene Tree Estimation Error: A Case Study of Fireflies (Coleoptera: Lampyridae)\",\"authors\":\"Sebastian Höhna, Sarah E Lower, Pablo Duchen, Ana Catalán\",\"doi\":\"10.1093/sysbio/syae065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genomic data has become ubiquitous in phylogenomic studies, including divergence time estimation, but provide new challenges. These challenges include, amongst others, biological gene tree discordance, methodological gene tree estimation error, and computational limitations on performing full Bayesian inference under complex models. In this study, we use a recently published firefly (Coleoptera: Lampyridae) anchored hybrid enrichment dataset (AHE; 436 loci for 88 Lampyridae species and 10 outgroup species) as a case study to explore gene tree estimation error and the robustness of divergence time estimation. First, we explored the amount of model violation using posterior predictive simulations because model violations are likely to bias phylogenetic inferences and produce gene tree estimation error. We specifically focused on missing data (either uniformly distributed or systematically) and the distribution of highly variable and conserved sites (either uniformly distributed or clustered). Our assessment of model adequacy showed that standard phylogenetic substitution models are not adequate for any of the 436 AHE loci. We tested if the model violations and alignment errors resulted indeed in gene tree estimation error by comparing the observed gene tree discordance to simulated gene tree discordance under the multispecies coalescent model. Thus, we show that the inferred gene tree discordance is not only due to biological mechanism but primarily due to inference errors. Lastly, we explored if divergence time estimation is robust despite the observed gene tree estimation error. We selected four subsets of the full AHE dataset, concatenated each subset and performed a Bayesian relaxed clock divergence estimation in RevBayes. The estimated divergence times overlapped for all nodes that are shared between the topologies. Thus, divergence time estimation is robust using any well selected data subset as long as the topology inference is robust.\",\"PeriodicalId\":22120,\"journal\":{\"name\":\"Systematic Biology\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2024-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systematic Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/sysbio/syae065\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EVOLUTIONARY BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syae065","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

基因组数据在系统发生学研究(包括分化时间估计)中已变得无处不在,但也带来了新的挑战。这些挑战包括生物基因树不一致、方法学基因树估计误差以及在复杂模型下进行完全贝叶斯推断的计算限制等。在本研究中,我们以最近发表的萤火虫(鞘翅目:灯蛾科)锚定杂交富集数据集(AHE;88个灯蛾科物种和10个外群物种的436个位点)为案例,探讨了基因树估计误差和分歧时间估计的稳健性。首先,我们利用后验预测模拟探索了模型违反的程度,因为模型违反很可能会使系统发育推断产生偏差并产生基因树估计误差。我们特别关注了缺失数据(均匀分布或系统分布)以及高变异和保守位点的分布(均匀分布或聚类分布)。我们对模型适当性的评估表明,标准的系统发生替换模型对 436 个 AHE 位点中的任何一个都不适当。我们通过比较观察到的基因树不一致性和多物种聚合模型下模拟的基因树不一致性,检验了违反模型和比对错误是否确实导致了基因树估计错误。因此,我们表明推断出的基因树不一致不仅是生物机制造成的,而且主要是推断错误造成的。最后,我们探讨了尽管观察到了基因树估计误差,但分歧时间估计是否稳健。我们从完整的 AHE 数据集中选择了四个子集,将每个子集连接起来,并在 RevBayes 中进行了贝叶斯松弛时钟发散估计。对于拓扑之间共享的所有节点,估计的发散时间都是重叠的。因此,只要拓扑推断是稳健的,那么使用任何精心挑选的数据子集进行发散时间估计都是稳健的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Robustness of Divergence Time Estimation Despite Gene Tree Estimation Error: A Case Study of Fireflies (Coleoptera: Lampyridae)
Genomic data has become ubiquitous in phylogenomic studies, including divergence time estimation, but provide new challenges. These challenges include, amongst others, biological gene tree discordance, methodological gene tree estimation error, and computational limitations on performing full Bayesian inference under complex models. In this study, we use a recently published firefly (Coleoptera: Lampyridae) anchored hybrid enrichment dataset (AHE; 436 loci for 88 Lampyridae species and 10 outgroup species) as a case study to explore gene tree estimation error and the robustness of divergence time estimation. First, we explored the amount of model violation using posterior predictive simulations because model violations are likely to bias phylogenetic inferences and produce gene tree estimation error. We specifically focused on missing data (either uniformly distributed or systematically) and the distribution of highly variable and conserved sites (either uniformly distributed or clustered). Our assessment of model adequacy showed that standard phylogenetic substitution models are not adequate for any of the 436 AHE loci. We tested if the model violations and alignment errors resulted indeed in gene tree estimation error by comparing the observed gene tree discordance to simulated gene tree discordance under the multispecies coalescent model. Thus, we show that the inferred gene tree discordance is not only due to biological mechanism but primarily due to inference errors. Lastly, we explored if divergence time estimation is robust despite the observed gene tree estimation error. We selected four subsets of the full AHE dataset, concatenated each subset and performed a Bayesian relaxed clock divergence estimation in RevBayes. The estimated divergence times overlapped for all nodes that are shared between the topologies. Thus, divergence time estimation is robust using any well selected data subset as long as the topology inference is robust.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Systematic Biology
Systematic Biology 生物-进化生物学
CiteScore
13.00
自引率
7.70%
发文量
70
审稿时长
6-12 weeks
期刊介绍: Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信