Infinite Mixture Models for Improved Modeling of Across-Site Evolutionary Variation.

IF 5.3 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Mandev S Gill, Guy Baele, Marc A Suchard, Philippe Lemey
{"title":"Infinite Mixture Models for Improved Modeling of Across-Site Evolutionary Variation.","authors":"Mandev S Gill, Guy Baele, Marc A Suchard, Philippe Lemey","doi":"10.1093/molbev/msaf199","DOIUrl":null,"url":null,"abstract":"<p><p>Scientific studies in many areas of biology routinely employ evolutionary analyses based on inference of phylogenetic trees from molecular sequence data. Evolutionary processes that act at the molecular level are highly variable, and properly accounting for heterogeneity is crucial for more accurate phylogenetic inference. Nucleotide substitution rates and patterns are known to vary among sites in multiple sequence alignments, and such variation can be modeled by partitioning alignments into categories corresponding to different substitution models. Determining a priori appropriate partitions can be difficult, however, and better model fit can be achieved through flexible Bayesian infinite mixture models that simultaneously infer the number of partitions, the partition that each site belongs to, and the evolutionary parameters corresponding to each partition. Here, we consider several different types of infinite mixture models, including classic Dirichlet process mixtures, as well as novel approaches for modeling across-site evolutionary variation: hierarchical models for data with a natural group structure, and infinite hidden Markov models that account for spatial patterns in alignments. In analyses of several viral data sets, we find that different types of models perform best in different scenarios, but infinite hidden Markov models emerge as particularly promising for larger data sets and complex evolutionary patterns characterized by multiple genes and overlapping reading frames. To enable these models to scale to large data sets, we adapt efficient Markov chain Monte Carlo algorithms and exploit opportunities for parallel computing. We implement this infinite mixture modeling framework in BEAST X, a widely-used software package for phylogenetic inference.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393045/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf199","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Scientific studies in many areas of biology routinely employ evolutionary analyses based on inference of phylogenetic trees from molecular sequence data. Evolutionary processes that act at the molecular level are highly variable, and properly accounting for heterogeneity is crucial for more accurate phylogenetic inference. Nucleotide substitution rates and patterns are known to vary among sites in multiple sequence alignments, and such variation can be modeled by partitioning alignments into categories corresponding to different substitution models. Determining a priori appropriate partitions can be difficult, however, and better model fit can be achieved through flexible Bayesian infinite mixture models that simultaneously infer the number of partitions, the partition that each site belongs to, and the evolutionary parameters corresponding to each partition. Here, we consider several different types of infinite mixture models, including classic Dirichlet process mixtures, as well as novel approaches for modeling across-site evolutionary variation: hierarchical models for data with a natural group structure, and infinite hidden Markov models that account for spatial patterns in alignments. In analyses of several viral data sets, we find that different types of models perform best in different scenarios, but infinite hidden Markov models emerge as particularly promising for larger data sets and complex evolutionary patterns characterized by multiple genes and overlapping reading frames. To enable these models to scale to large data sets, we adapt efficient Markov chain Monte Carlo algorithms and exploit opportunities for parallel computing. We implement this infinite mixture modeling framework in BEAST X, a widely-used software package for phylogenetic inference.

用于改进跨站点进化变异建模的无限混合模型。
生物学许多领域的科学研究通常采用基于分子序列数据的系统发育树推断的进化分析。在分子水平上起作用的进化过程是高度可变的,正确地解释异质性对于更准确的系统发育推断至关重要。已知核苷酸取代率和模式在多个序列比对中的位点之间存在差异,这种差异可以通过将比对划分为与不同取代模型相对应的类别来建模。然而,确定先验的适当分区可能是困难的,通过灵活的贝叶斯无限混合模型可以获得更好的模型拟合,该模型可以同时推断分区的数量、每个站点所属的分区以及每个分区对应的进化参数。在这里,我们考虑了几种不同类型的无限混合模型,包括经典的Dirichlet过程混合,以及建模跨站点进化变化的新方法:具有自然群结构的数据的分层模型,以及解释排列空间模式的无限隐马尔可夫模型。在对几个病毒数据集的分析中,我们发现不同类型的模型在不同的场景下表现最好,但无限隐马尔可夫模型在更大的数据集和以多基因和重叠阅读框为特征的复杂进化模式中表现得特别有希望。为了使这些模型能够扩展到大型数据集,我们采用了有效的马尔可夫链蒙特卡罗算法,并利用了并行计算的机会。我们在BEAST X中实现了这个无限混合建模框架,BEAST X是一个广泛使用的系统发育推断软件包。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular biology and evolution
Molecular biology and evolution 生物-进化生物学
CiteScore
19.70
自引率
3.70%
发文量
257
审稿时长
1 months
期刊介绍: Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信