光谱簇超级树：快速且统计稳健的有根系统树合并。

IF 3.9 3区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Frontiers in Molecular Biosciences Pub Date : 2024-10-30 eCollection Date: 2024-01-01 DOI:10.3389/fmolb.2024.1432495

Robert N McArthur, Ahad N Zehmakan, Michael A Charleston, Yu Lin, Gavin Huttley

{"title":"光谱簇超级树：快速且统计稳健的有根系统树合并。","authors":"Robert N McArthur, Ahad N Zehmakan, Michael A Charleston, Yu Lin, Gavin Huttley","doi":"10.3389/fmolb.2024.1432495","DOIUrl":null,"url":null,"abstract":"The algorithms for phylogenetic reconstruction are central to computational molecular evolution. The relentless pace of data acquisition has exposed their poor scalability and the conclusion that the conventional application of these methods is impractical and not justifiable from an energy usage perspective. Furthermore, the drive to improve the statistical performance of phylogenetic methods produces increasingly parameter-rich models of sequence evolution, which worsens the computational performance. Established theoretical and algorithmic results identify supertree methods as critical to divide-and-conquer strategies for improving scalability of phylogenetic reconstruction. Of particular importance is the ability to explicitly accommodate rooted topologies. These can arise from the more biologically plausible non-stationary models of sequence evolution. We make a contribution to addressing this challenge with Spectral Cluster Supertree, a novel supertree method for merging a set of overlapping rooted phylogenetic trees. It offers significant improvements over Min-Cut supertree and previous state-of-the-art methods in terms of both time complexity and overall topological accuracy, particularly for problems of large size. We perform comparisons against Min-Cut supertree and Bad Clade Deletion. Leveraging two tree topology distance metrics, we demonstrate that while Bad Clade Deletion generates more correct clades in its resulting supertree, Spectral Cluster Supertree's generated tree is generally more topologically close to the true model tree. Over large datasets containing 10,000 taxa and <math><mrow><mo>∼</mo></mrow> </math> 500 source trees, where Bad Clade Deletion usually takes <math><mrow><mo>∼</mo></mrow> </math> 2 h to run, our method generates a supertree in on average 20 s. Spectral Cluster Supertree is released under an open source license and is available on the python package index as sc-supertree.","PeriodicalId":12465,"journal":{"name":"Frontiers in Molecular Biosciences","volume":"11 ","pages":"1432495"},"PeriodicalIF":3.9000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11561713/pdf/","citationCount":"0","resultStr":"{\"title\":\"Spectral cluster supertree: fast and statistically robust merging of rooted phylogenetic trees.\",\"authors\":\"Robert N McArthur, Ahad N Zehmakan, Michael A Charleston, Yu Lin, Gavin Huttley\",\"doi\":\"10.3389/fmolb.2024.1432495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The algorithms for phylogenetic reconstruction are central to computational molecular evolution. The relentless pace of data acquisition has exposed their poor scalability and the conclusion that the conventional application of these methods is impractical and not justifiable from an energy usage perspective. Furthermore, the drive to improve the statistical performance of phylogenetic methods produces increasingly parameter-rich models of sequence evolution, which worsens the computational performance. Established theoretical and algorithmic results identify supertree methods as critical to divide-and-conquer strategies for improving scalability of phylogenetic reconstruction. Of particular importance is the ability to explicitly accommodate rooted topologies. These can arise from the more biologically plausible non-stationary models of sequence evolution. We make a contribution to addressing this challenge with Spectral Cluster Supertree, a novel supertree method for merging a set of overlapping rooted phylogenetic trees. It offers significant improvements over Min-Cut supertree and previous state-of-the-art methods in terms of both time complexity and overall topological accuracy, particularly for problems of large size. We perform comparisons against Min-Cut supertree and Bad Clade Deletion. Leveraging two tree topology distance metrics, we demonstrate that while Bad Clade Deletion generates more correct clades in its resulting supertree, Spectral Cluster Supertree's generated tree is generally more topologically close to the true model tree. Over large datasets containing 10,000 taxa and <math><mrow><mo>∼</mo></mrow> </math> 500 source trees, where Bad Clade Deletion usually takes <math><mrow><mo>∼</mo></mrow> </math> 2 h to run, our method generates a supertree in on average 20 s. Spectral Cluster Supertree is released under an open source license and is available on the python package index as sc-supertree.\",\"PeriodicalId\":12465,\"journal\":{\"name\":\"Frontiers in Molecular Biosciences\",\"volume\":\"11 \",\"pages\":\"1432495\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11561713/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Molecular Biosciences\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3389/fmolb.2024.1432495\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Molecular Biosciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3389/fmolb.2024.1432495","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

系统发育重建算法是计算分子进化的核心。随着数据获取速度的不断加快，这些算法的可扩展性越来越差，从能源消耗的角度来看，传统应用这些方法是不切实际的，也是不合理的。此外，为了提高系统进化方法的统计性能，序列进化模型的参数越来越丰富，从而导致计算性能恶化。已有的理论和算法结果表明，超级树方法是提高系统发育重建可扩展性的分而治之策略的关键。其中尤为重要的是明确容纳有根拓扑的能力。这些拓扑结构可能产生于序列进化中生物学上更可信的非稳态模型。我们通过光谱簇超级树（Spectral Cluster Supertree）解决了这一难题，这是一种新颖的超级树方法，用于合并一组重叠的有根系统发生树。与 Min-Cut supertree 和以前的先进方法相比，该方法在时间复杂性和整体拓扑精度方面都有显著提高，尤其是在处理大型问题时。我们对 Min-Cut supertree 和 Bad Clade Deletion 进行了比较。利用两种树拓扑距离度量方法，我们证明了虽然 Bad Clade Deletion 在其生成的超级树中生成了更多正确的支系，但 Spectral Cluster Supertree 生成的树通常在拓扑上更接近真实的模型树。在包含 10,000 个分类群和∼ 500 棵源树的大型数据集上，Bad Clade Deletion 通常需要运行∼ 2 小时，而我们的方法生成超级树平均只需 20 秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spectral cluster supertree: fast and statistically robust merging of rooted phylogenetic trees.

The algorithms for phylogenetic reconstruction are central to computational molecular evolution. The relentless pace of data acquisition has exposed their poor scalability and the conclusion that the conventional application of these methods is impractical and not justifiable from an energy usage perspective. Furthermore, the drive to improve the statistical performance of phylogenetic methods produces increasingly parameter-rich models of sequence evolution, which worsens the computational performance. Established theoretical and algorithmic results identify supertree methods as critical to divide-and-conquer strategies for improving scalability of phylogenetic reconstruction. Of particular importance is the ability to explicitly accommodate rooted topologies. These can arise from the more biologically plausible non-stationary models of sequence evolution. We make a contribution to addressing this challenge with Spectral Cluster Supertree, a novel supertree method for merging a set of overlapping rooted phylogenetic trees. It offers significant improvements over Min-Cut supertree and previous state-of-the-art methods in terms of both time complexity and overall topological accuracy, particularly for problems of large size. We perform comparisons against Min-Cut supertree and Bad Clade Deletion. Leveraging two tree topology distance metrics, we demonstrate that while Bad Clade Deletion generates more correct clades in its resulting supertree, Spectral Cluster Supertree's generated tree is generally more topologically close to the true model tree. Over large datasets containing 10,000 taxa and $\sim$ 500 source trees, where Bad Clade Deletion usually takes $\sim$ 2 h to run, our method generates a supertree in on average 20 s. Spectral Cluster Supertree is released under an open source license and is available on the python package index as sc-supertree.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Molecular Biosciences Biochemistry, Genetics and Molecular Biology-Biochemistry

CiteScore

7.20

自引率

4.00%

发文量

1361

审稿时长

14 weeks

期刊介绍： Much of contemporary investigation in the life sciences is devoted to the molecular-scale understanding of the relationships between genes and the environment — in particular, dynamic alterations in the levels, modifications, and interactions of cellular effectors, including proteins. Frontiers in Molecular Biosciences offers an international publication platform for basic as well as applied research; we encourage contributions spanning both established and emerging areas of biology. To this end, the journal draws from empirical disciplines such as structural biology, enzymology, biochemistry, and biophysics, capitalizing as well on the technological advancements that have enabled metabolomics and proteomics measurements in massively parallel throughput, and the development of robust and innovative computational biology strategies. We also recognize influences from medicine and technology, welcoming studies in molecular genetics, molecular diagnostics and therapeutics, and nanotechnology. Our ultimate objective is the comprehensive illustration of the molecular mechanisms regulating proteins, nucleic acids, carbohydrates, lipids, and small metabolites in organisms across all branches of life. In addition to interesting new findings, techniques, and applications, Frontiers in Molecular Biosciences will consider new testable hypotheses to inspire different perspectives and stimulate scientific dialogue. The integration of in silico, in vitro, and in vivo approaches will benefit endeavors across all domains of the life sciences.