Predicting Phylogenetic Bootstrap Values via Machine Learning.

IF 11 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Julius Wiegert, Dimitri Höhler, Julia Haag, Alexandros Stamatakis
{"title":"Predicting Phylogenetic Bootstrap Values via Machine Learning.","authors":"Julius Wiegert, Dimitri Höhler, Julia Haag, Alexandros Stamatakis","doi":"10.1093/molbev/msae215","DOIUrl":null,"url":null,"abstract":"<p><p>Estimating the statistical robustness of the inferred tree(s) constitutes an integral part of most phylogenetic analyses. Commonly, one computes and assigns a branch support value to each inner branch of the inferred phylogeny. The still most widely used method for calculating branch support on trees inferred under maximum likelihood (ML) is the Standard, nonparametric Felsenstein bootstrap support (SBS). Due to the high computational cost of the SBS, a plethora of methods has been developed to approximate it, for instance, via the rapid bootstrap (RB) algorithm. There have also been attempts to devise faster, alternative support measures, such as the SH-aLRT (Shimodaira-Hasegawa-like approximate likelihood ratio test) or the UltraFast bootstrap 2 (UFBoot2) method. Those faster alternatives exhibit some limitations, such as the need to assess model violations (UFBoot2) or unstable behavior in the low support interval range (SH-aLRT). Here, we present the educated bootstrap guesser (EBG), a machine learning-based tool that predicts SBS branch support values for a given input phylogeny. EBG is on average 9.4 (σ=5.5) times faster than UFBoot2. EBG-based SBS estimates exhibit a median absolute error of 5 when predicting SBS values between 0 and 100. Furthermore, EBG also provides uncertainty measures for all per-branch SBS predictions and thereby allows for a more rigorous and careful interpretation. EBG can, for instance, predict SBS support values on a phylogeny comprising 1,654 SARS-CoV2 genome sequences within 3 h on a mid-class laptop. EBG is available under GNU GPL3.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523138/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msae215","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Estimating the statistical robustness of the inferred tree(s) constitutes an integral part of most phylogenetic analyses. Commonly, one computes and assigns a branch support value to each inner branch of the inferred phylogeny. The still most widely used method for calculating branch support on trees inferred under maximum likelihood (ML) is the Standard, nonparametric Felsenstein bootstrap support (SBS). Due to the high computational cost of the SBS, a plethora of methods has been developed to approximate it, for instance, via the rapid bootstrap (RB) algorithm. There have also been attempts to devise faster, alternative support measures, such as the SH-aLRT (Shimodaira-Hasegawa-like approximate likelihood ratio test) or the UltraFast bootstrap 2 (UFBoot2) method. Those faster alternatives exhibit some limitations, such as the need to assess model violations (UFBoot2) or unstable behavior in the low support interval range (SH-aLRT). Here, we present the educated bootstrap guesser (EBG), a machine learning-based tool that predicts SBS branch support values for a given input phylogeny. EBG is on average 9.4 (σ=5.5) times faster than UFBoot2. EBG-based SBS estimates exhibit a median absolute error of 5 when predicting SBS values between 0 and 100. Furthermore, EBG also provides uncertainty measures for all per-branch SBS predictions and thereby allows for a more rigorous and careful interpretation. EBG can, for instance, predict SBS support values on a phylogeny comprising 1,654 SARS-CoV2 genome sequences within 3 h on a mid-class laptop. EBG is available under GNU GPL3.

通过机器学习预测系统发育引导值
估计推断树的统计稳健性是大多数系统发生学分析不可或缺的一部分。通常,我们会计算并给推断出的系统发生的每个内部分支分配一个分支支持度值。在最大似然法(ML)下计算推断树的分支支持度时,目前最广泛使用的方法是标准非参数费尔森斯坦引导支持法(SBS)。由于 SBS 的计算成本较高,人们开发了大量方法来近似 SBS,例如通过快速引导(RB)算法。此外,还有人尝试设计更快的替代支持度量,如 SH-aLRT(下平-长谷川类近似似然比检验)或超快速引导 2(UFBoot2)方法。这些更快的替代方法有一些局限性,比如需要评估模型违反情况(UFBoot2)或在低支持区间范围内的不稳定行为(SH-aLRT)。在此,我们介绍了教育引导猜测器(EBG),这是一种基于机器学习的工具,可预测给定输入系统发生的 SBS 分支支持值。EBG 比 UFBoot2 平均快 9.4 (σ = 5.5) 倍。在预测 0 到 100 之间的 SBS 值时,基于 EBG 的 SBS 估计值的中位绝对误差为 5。此外,EBG 还为所有每个分支的 SBS 预测提供了不确定性度量,因此可以进行更严格、更仔细的解释。例如,EBG 可以在一台中档笔记本电脑上,在 3 小时内预测由 1654 美元 SARS-CoV2 基因组序列组成的系统发生的 SBS 支持值。EBG 在 GNU GPL3 下提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular biology and evolution
Molecular biology and evolution 生物-进化生物学
CiteScore
19.70
自引率
3.70%
发文量
257
审稿时长
1 months
期刊介绍: Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信