Random-Effects Substitution Models for Phylogenetics via Scalable Gradient Approximations.

IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY
Andrew F Magee, Andrew J Holbrook, Jonathan E Pekar, Itzue W Caviedes-Solis, Fredrick A Matsen Iv, Guy Baele, Joel O Wertheim, Xiang Ji, Philippe Lemey, Marc A Suchard
{"title":"Random-Effects Substitution Models for Phylogenetics via Scalable Gradient Approximations.","authors":"Andrew F Magee, Andrew J Holbrook, Jonathan E Pekar, Itzue W Caviedes-Solis, Fredrick A Matsen Iv, Guy Baele, Joel O Wertheim, Xiang Ji, Philippe Lemey, Marc A Suchard","doi":"10.1093/sysbio/syae019","DOIUrl":null,"url":null,"abstract":"<p><p>Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"562-578"},"PeriodicalIF":6.1000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11498053/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syae019","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

通过可扩展梯度近似为系统发育建立随机效应替代模型
系统发育和离散性状进化推断在很大程度上取决于对基本性状替换过程的适当描述。在本文中,我们提出了随机效应替代模型,这些模型将常见的连续时间马尔可夫链模型扩展为一类更丰富的过程,能够捕捉到更多的替代动态。由于这些随机效应替代模型所需的参数往往比通常的同类模型多得多,因此推断工作在统计和计算上都具有挑战性。因此,我们还提出了一种高效的方法,用于计算与所有未知替代模型参数相关的数据似然梯度的近似值。我们证明,在大树和状态空间的随机效应替代模型下,这种近似梯度可以扩展基于采样的推断,即通过哈密尔顿蒙特卡洛进行贝叶斯推断。应用于 583 个 SARS-CoV-2 序列的数据集时,随机效应 HKY 模型显示出替换过程中不可逆的强烈信号,后验预测模型检查清楚地表明它是一个比可逆模型更适当的模型。在分析 14 个地区之间 1441 个甲型流感病毒(H3N2)序列的系统地理学传播模式时,随机效应系统地理学替代模型推断航空旅行量能充分预测几乎所有的传播率。随机效应状态依赖替代模型显示,没有证据表明树栖性对树蛙亚科的游泳模式有影响。模拟结果表明,随机效应替代模型可以容纳与基础替代模型的微小偏离或根本偏离。我们的研究表明,与传统方法相比,我们基于梯度的推断方法的时间效率要高出一个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Systematic Biology
Systematic Biology 生物-进化生物学
CiteScore
13.00
自引率
7.70%
发文量
70
审稿时长
6-12 weeks
期刊介绍: Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信