论评估贝叶斯系统发育推断中拓扑收敛的重要性

ArXiv Pub Date : 2024-08-19
Marius Brusselmans, Luiz Max Carvalho, Samuel L Hong, Jiansi Gao, Frederick A Matsen, Andrew Rambaut, Philippe Lemey, Marc A Suchard, Gytis Dudas, Guy Baele
{"title":"论评估贝叶斯系统发育推断中拓扑收敛的重要性","authors":"Marius Brusselmans, Luiz Max Carvalho, Samuel L Hong, Jiansi Gao, Frederick A Matsen, Andrew Rambaut, Philippe Lemey, Marc A Suchard, Gytis Dudas, Guy Baele","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Modern phylogenetics research is often performed within a Bayesian framework, using sampling algorithms such as Markov chain Monte Carlo (MCMC) to approximate the posterior distribution. These algorithms require careful evaluation of the quality of the generated samples. Within the field of phylogenetics, one frequently adopted diagnostic approach is to evaluate the <i>effective sample size</i> (ESS) and to investigate trace graphs of the sampled parameters. A major limitation of these approaches is that they are developed for continuous parameters and therefore incompatible with a crucial parameter in these inferences: the <i>tree topology</i>. Several recent advancements have aimed at extending these diagnostics to topological space. In this reflection paper, we present two case studies - one on Ebola virus and one on HIV - illustrating how these topological diagnostics can contain information not found in standard diagnostics, and how decisions regarding which of these diagnostics to compute can impact inferences regarding MCMC convergence and mixing. Our results show the importance of running multiple replicate analyses and of carefully assessing topological convergence using the output of these replicate analyses. To this end, we illustrate different ways of assessing and visualizing the topological convergence of these replicates. Given the major importance of detecting convergence and mixing issues in Bayesian phylogenetic analyses, the lack of a unified approach to this problem warrants further action, especially now that additional tools are becoming available to researchers.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11383445/pdf/","citationCount":"0","resultStr":"{\"title\":\"On the importance of assessing topological convergence in Bayesian phylogenetic inference.\",\"authors\":\"Marius Brusselmans, Luiz Max Carvalho, Samuel L Hong, Jiansi Gao, Frederick A Matsen, Andrew Rambaut, Philippe Lemey, Marc A Suchard, Gytis Dudas, Guy Baele\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Modern phylogenetics research is often performed within a Bayesian framework, using sampling algorithms such as Markov chain Monte Carlo (MCMC) to approximate the posterior distribution. These algorithms require careful evaluation of the quality of the generated samples. Within the field of phylogenetics, one frequently adopted diagnostic approach is to evaluate the <i>effective sample size</i> (ESS) and to investigate trace graphs of the sampled parameters. A major limitation of these approaches is that they are developed for continuous parameters and therefore incompatible with a crucial parameter in these inferences: the <i>tree topology</i>. Several recent advancements have aimed at extending these diagnostics to topological space. In this reflection paper, we present two case studies - one on Ebola virus and one on HIV - illustrating how these topological diagnostics can contain information not found in standard diagnostics, and how decisions regarding which of these diagnostics to compute can impact inferences regarding MCMC convergence and mixing. Our results show the importance of running multiple replicate analyses and of carefully assessing topological convergence using the output of these replicate analyses. To this end, we illustrate different ways of assessing and visualizing the topological convergence of these replicates. Given the major importance of detecting convergence and mixing issues in Bayesian phylogenetic analyses, the lack of a unified approach to this problem warrants further action, especially now that additional tools are becoming available to researchers.</p>\",\"PeriodicalId\":93888,\"journal\":{\"name\":\"ArXiv\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11383445/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

现代系统发育学研究通常在贝叶斯框架内进行,使用马尔科夫链蒙特卡罗(MCMC)等采样算法来近似后验分布。这些算法需要对生成样本的质量进行仔细评估。在系统发育学领域,经常采用的一种诊断方法是评估有效样本量(ESS)和研究采样参数的迹图。这些方法的一个主要局限是它们是针对连续参数开发的,因此与这些推论中的一个关键参数--树拓扑不兼容。最近的一些进展旨在将这些诊断方法扩展到拓扑空间。在这篇反思论文中,我们介绍了两个案例研究--一个关于埃博拉病毒,另一个关于艾滋病毒--说明这些拓扑诊断如何包含标准诊断中找不到的信息,以及计算这些诊断中哪一个的决策如何影响有关 MCMC 收敛和混合的推论。我们的结果表明,运行多个重复分析以及使用这些重复分析的输出仔细评估拓扑收敛性非常重要。为此,我们说明了评估和可视化这些副本拓扑收敛的不同方法。鉴于检测贝叶斯系统发育分析中的收敛性和混合问题非常重要,缺乏解决这一问题的统一方法值得我们采取进一步行动,尤其是在研究人员可以使用更多工具的今天。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the importance of assessing topological convergence in Bayesian phylogenetic inference.

Modern phylogenetics research is often performed within a Bayesian framework, using sampling algorithms such as Markov chain Monte Carlo (MCMC) to approximate the posterior distribution. These algorithms require careful evaluation of the quality of the generated samples. Within the field of phylogenetics, one frequently adopted diagnostic approach is to evaluate the effective sample size (ESS) and to investigate trace graphs of the sampled parameters. A major limitation of these approaches is that they are developed for continuous parameters and therefore incompatible with a crucial parameter in these inferences: the tree topology. Several recent advancements have aimed at extending these diagnostics to topological space. In this reflection paper, we present two case studies - one on Ebola virus and one on HIV - illustrating how these topological diagnostics can contain information not found in standard diagnostics, and how decisions regarding which of these diagnostics to compute can impact inferences regarding MCMC convergence and mixing. Our results show the importance of running multiple replicate analyses and of carefully assessing topological convergence using the output of these replicate analyses. To this end, we illustrate different ways of assessing and visualizing the topological convergence of these replicates. Given the major importance of detecting convergence and mixing issues in Bayesian phylogenetic analyses, the lack of a unified approach to this problem warrants further action, especially now that additional tools are becoming available to researchers.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信