Bayesian modelling of compositional heterogeneity in molecular phylogenetics.

Pub Date : 2014-10-01 DOI:10.1515/sagmb-2013-0077
Sarah E Heaps, Tom M W Nye, Richard J Boys, Tom A Williams, T Martin Embley
{"title":"Bayesian modelling of compositional heterogeneity in molecular phylogenetics.","authors":"Sarah E Heaps,&nbsp;Tom M W Nye,&nbsp;Richard J Boys,&nbsp;Tom A Williams,&nbsp;T Martin Embley","doi":"10.1515/sagmb-2013-0077","DOIUrl":null,"url":null,"abstract":"<p><p>In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.</p>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2013-0077","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2013-0077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

In molecular phylogenetics, standard models of sequence evolution generally assume that sequence composition remains constant over evolutionary time. However, this assumption is violated in many datasets which show substantial heterogeneity in sequence composition across taxa. We propose a model which allows compositional heterogeneity across branches, and formulate the model in a Bayesian framework. Specifically, the root and each branch of the tree is associated with its own composition vector whilst a global matrix of exchangeability parameters applies everywhere on the tree. We encourage borrowing of strength between branches by developing two possible priors for the composition vectors: one in which information can be exchanged equally amongst all branches of the tree and another in which more information is exchanged between neighbouring branches than between distant branches. We also propose a Markov chain Monte Carlo (MCMC) algorithm for posterior inference which uses data augmentation of substitutional histories to yield a simple complete data likelihood function that factorises over branches and allows Gibbs updates for most parameters. Standard phylogenetic models are not informative about the root position. Therefore a significant advantage of the proposed model is that it allows inference about rooted trees. The position of the root is fundamental to the biological interpretation of trees, both for polarising trait evolution and for establishing the order of divergence among lineages. Furthermore, unlike some other related models from the literature, inference in the model we propose can be carried out through a simple MCMC scheme which does not require problematic dimension-changing moves. We investigate the performance of the model and priors in analyses of two alignments for which there is strong biological opinion about the tree topology and root position.

分享
查看原文
分子系统发育中组成异质性的贝叶斯模型。
在分子系统发育学中,序列进化的标准模型通常假设序列组成在进化过程中保持不变。然而,这一假设在许多数据集中是不成立的,这些数据集显示了不同分类群序列组成的实质性异质性。我们提出了一个允许跨分支组成异质性的模型,并在贝叶斯框架中制定模型。具体来说,树的根和每个分支都与自己的组合向量相关联,而可交换性参数的全局矩阵适用于树的任何地方。我们通过为组合向量开发两种可能的先验来鼓励分支之间的强度借用:一种是在树的所有分支之间可以平等地交换信息,另一种是在相邻分支之间交换的信息比在遥远分支之间交换的信息更多。我们还提出了一种用于后验推理的马尔可夫链蒙特卡罗(MCMC)算法,该算法使用替换历史的数据扩充来产生一个简单的完整数据似然函数,该函数可以对分支进行因式分解,并允许对大多数参数进行吉布斯更新。标准的系统发育模型不能提供关于根位置的信息。因此,提出的模型的一个显著优点是它允许对有根树进行推理。根的位置对树木的生物学解释至关重要,无论是对分化性状进化还是对建立谱系间的分化顺序都是如此。此外,与文献中的其他相关模型不同,我们提出的模型中的推理可以通过简单的MCMC方案进行,该方案不需要有问题的变维操作。我们研究了模型的性能和先验分析两种排列,其中有很强的生物学观点关于树的拓扑结构和根的位置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信