Finding high posterior density phylogenies by systematically extending a directed acyclic graph.

IF 1.7 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Algorithms for Molecular Biology Pub Date : 2025-02-28 DOI:10.1186/s13015-025-00273-x

Chris Jennings-Shaffer, David H Rich, Matthew Macaulay, Michael D Karcher, Tanvi Ganapathy, Shosuke Kiami, Anna Kooperberg, Cheng Zhang, Marc A Suchard, Frederick A Matsen

{"title":"Finding high posterior density phylogenies by systematically extending a directed acyclic graph.","authors":"Chris Jennings-Shaffer, David H Rich, Matthew Macaulay, Michael D Karcher, Tanvi Ganapathy, Shosuke Kiami, Anna Kooperberg, Cheng Zhang, Marc A Suchard, Frederick A Matsen","doi":"10.1186/s13015-025-00273-x","DOIUrl":null,"url":null,"abstract":"<p><p>Bayesian phylogenetics typically estimates a posterior distribution, or aspects thereof, using Markov chain Monte Carlo methods. These methods integrate over tree space by applying local rearrangements to move a tree through its space as a random walk. Previous work explored the possibility of replacing this random walk with a systematic search, but was quickly overwhelmed by the large number of probable trees in the posterior distribution. In this paper we develop methods to sidestep this problem using a recently introduced structure called the subsplit directed acyclic graph (sDAG). This structure can represent many trees at once, and local rearrangements of trees translate to methods of enlarging the sDAG. Here we propose two methods of introducing, ranking, and selecting local rearrangements on sDAGs to produce a collection of trees with high posterior density. One of these methods successfully recovers the set of high posterior density trees across a range of data sets. However, we find that a simpler strategy of aggregating trees into an sDAG in fact is computationally faster and returns a higher fraction of probable trees.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"2"},"PeriodicalIF":1.7000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11869616/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-025-00273-x","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Bayesian phylogenetics typically estimates a posterior distribution, or aspects thereof, using Markov chain Monte Carlo methods. These methods integrate over tree space by applying local rearrangements to move a tree through its space as a random walk. Previous work explored the possibility of replacing this random walk with a systematic search, but was quickly overwhelmed by the large number of probable trees in the posterior distribution. In this paper we develop methods to sidestep this problem using a recently introduced structure called the subsplit directed acyclic graph (sDAG). This structure can represent many trees at once, and local rearrangements of trees translate to methods of enlarging the sDAG. Here we propose two methods of introducing, ranking, and selecting local rearrangements on sDAGs to produce a collection of trees with high posterior density. One of these methods successfully recovers the set of high posterior density trees across a range of data sets. However, we find that a simpler strategy of aggregating trees into an sDAG in fact is computationally faster and returns a higher fraction of probable trees.

Abstract Image

查看原文本刊更多论文

通过系统地扩展有向无环图来寻找高后验密度系统发育。

贝叶斯系统发育通常使用马尔可夫链蒙特卡罗方法估计后验分布或其各个方面。这些方法通过应用局部重排来将树作为随机行走在其空间中移动，从而在树空间中集成。先前的工作探索了用系统搜索取代随机漫步的可能性，但很快就被后验分布中大量的可能树所淹没。在本文中，我们开发了一种方法来回避这个问题，使用一种最近引入的结构，称为子分裂有向无环图（sDAG）。这种结构可以一次表示许多树，并且树的局部重排转化为扩大sDAG的方法。本文提出了引入、排序和选择sDAGs上的局部重排的两种方法，以产生具有高后验密度的树集合。其中一种方法成功地恢复了一系列数据集上的高后验密度树集。然而，我们发现将树聚合到sDAG中的更简单的策略实际上在计算上更快，并且返回更高比例的可能树。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Algorithms for Molecular Biology 生物-生化研究方法

CiteScore

2.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.