用于改进跨站点进化变异建模的无限混合模型。

IF 5.3 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution Pub Date : 2025-07-30 DOI:10.1093/molbev/msaf199

Mandev S Gill, Guy Baele, Marc A Suchard, Philippe Lemey

{"title":"用于改进跨站点进化变异建模的无限混合模型。","authors":"Mandev S Gill, Guy Baele, Marc A Suchard, Philippe Lemey","doi":"10.1093/molbev/msaf199","DOIUrl":null,"url":null,"abstract":"Scientific studies in many areas of biology routinely employ evolutionary analyses based on inference of phylogenetic trees from molecular sequence data. Evolutionary processes that act at the molecular level are highly variable, and properly accounting for heterogeneity is crucial for more accurate phylogenetic inference. Nucleotide substitution rates and patterns are known to vary among sites in multiple sequence alignments, and such variation can be modeled by partitioning alignments into categories corresponding to different substitution models. Determining a priori appropriate partitions can be difficult, however, and better model fit can be achieved through flexible Bayesian infinite mixture models that simultaneously infer the number of partitions, the partition that each site belongs to, and the evolutionary parameters corresponding to each partition. Here, we consider several different types of infinite mixture models, including classic Dirichlet process mixtures, as well as novel approaches for modeling across-site evolutionary variation: hierarchical models for data with a natural group structure, and infinite hidden Markov models that account for spatial patterns in alignments. In analyses of several viral data sets, we find that different types of models perform best in different scenarios, but infinite hidden Markov models emerge as particularly promising for larger data sets and complex evolutionary patterns characterized by multiple genes and overlapping reading frames. To enable these models to scale to large data sets, we adapt efficient Markov chain Monte Carlo algorithms and exploit opportunities for parallel computing. We implement this infinite mixture modeling framework in BEAST X, a widely-used software package for phylogenetic inference.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393045/pdf/","citationCount":"0","resultStr":"{\"title\":\"Infinite Mixture Models for Improved Modeling of Across-Site Evolutionary Variation.\",\"authors\":\"Mandev S Gill, Guy Baele, Marc A Suchard, Philippe Lemey\",\"doi\":\"10.1093/molbev/msaf199\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scientific studies in many areas of biology routinely employ evolutionary analyses based on inference of phylogenetic trees from molecular sequence data. Evolutionary processes that act at the molecular level are highly variable, and properly accounting for heterogeneity is crucial for more accurate phylogenetic inference. Nucleotide substitution rates and patterns are known to vary among sites in multiple sequence alignments, and such variation can be modeled by partitioning alignments into categories corresponding to different substitution models. Determining a priori appropriate partitions can be difficult, however, and better model fit can be achieved through flexible Bayesian infinite mixture models that simultaneously infer the number of partitions, the partition that each site belongs to, and the evolutionary parameters corresponding to each partition. Here, we consider several different types of infinite mixture models, including classic Dirichlet process mixtures, as well as novel approaches for modeling across-site evolutionary variation: hierarchical models for data with a natural group structure, and infinite hidden Markov models that account for spatial patterns in alignments. In analyses of several viral data sets, we find that different types of models perform best in different scenarios, but infinite hidden Markov models emerge as particularly promising for larger data sets and complex evolutionary patterns characterized by multiple genes and overlapping reading frames. To enable these models to scale to large data sets, we adapt efficient Markov chain Monte Carlo algorithms and exploit opportunities for parallel computing. We implement this infinite mixture modeling framework in BEAST X, a widely-used software package for phylogenetic inference.\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12393045/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msaf199\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf199","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

生物学许多领域的科学研究通常采用基于分子序列数据的系统发育树推断的进化分析。在分子水平上起作用的进化过程是高度可变的，正确地解释异质性对于更准确的系统发育推断至关重要。已知核苷酸取代率和模式在多个序列比对中的位点之间存在差异，这种差异可以通过将比对划分为与不同取代模型相对应的类别来建模。然而，确定先验的适当分区可能是困难的，通过灵活的贝叶斯无限混合模型可以获得更好的模型拟合，该模型可以同时推断分区的数量、每个站点所属的分区以及每个分区对应的进化参数。在这里，我们考虑了几种不同类型的无限混合模型，包括经典的Dirichlet过程混合，以及建模跨站点进化变化的新方法：具有自然群结构的数据的分层模型，以及解释排列空间模式的无限隐马尔可夫模型。在对几个病毒数据集的分析中，我们发现不同类型的模型在不同的场景下表现最好，但无限隐马尔可夫模型在更大的数据集和以多基因和重叠阅读框为特征的复杂进化模式中表现得特别有希望。为了使这些模型能够扩展到大型数据集，我们采用了有效的马尔可夫链蒙特卡罗算法，并利用了并行计算的机会。我们在BEAST X中实现了这个无限混合建模框架，BEAST X是一个广泛使用的系统发育推断软件包。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Infinite Mixture Models for Improved Modeling of Across-Site Evolutionary Variation.

Scientific studies in many areas of biology routinely employ evolutionary analyses based on inference of phylogenetic trees from molecular sequence data. Evolutionary processes that act at the molecular level are highly variable, and properly accounting for heterogeneity is crucial for more accurate phylogenetic inference. Nucleotide substitution rates and patterns are known to vary among sites in multiple sequence alignments, and such variation can be modeled by partitioning alignments into categories corresponding to different substitution models. Determining a priori appropriate partitions can be difficult, however, and better model fit can be achieved through flexible Bayesian infinite mixture models that simultaneously infer the number of partitions, the partition that each site belongs to, and the evolutionary parameters corresponding to each partition. Here, we consider several different types of infinite mixture models, including classic Dirichlet process mixtures, as well as novel approaches for modeling across-site evolutionary variation: hierarchical models for data with a natural group structure, and infinite hidden Markov models that account for spatial patterns in alignments. In analyses of several viral data sets, we find that different types of models perform best in different scenarios, but infinite hidden Markov models emerge as particularly promising for larger data sets and complex evolutionary patterns characterized by multiple genes and overlapping reading frames. To enable these models to scale to large data sets, we adapt efficient Markov chain Monte Carlo algorithms and exploit opportunities for parallel computing. We implement this infinite mixture modeling framework in BEAST X, a widely-used software package for phylogenetic inference.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular biology and evolution 生物-进化生物学

CiteScore

19.70

自引率

3.70%

发文量

257

审稿时长

1 months

期刊介绍： Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.