MixtureFinder：估计DNA混合模型用于系统发育分析。

IF 11 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution Pub Date : 2025-01-06 DOI:10.1093/molbev/msae264

Huaiyan Ren, Thomas K F Wong, Bui Quang Minh, Robert Lanfear

{"title":"MixtureFinder：估计DNA混合模型用于系统发育分析。","authors":"Huaiyan Ren, Thomas K F Wong, Bui Quang Minh, Robert Lanfear","doi":"10.1093/molbev/msae264","DOIUrl":null,"url":null,"abstract":"In phylogenetic studies, both partitioned models and mixture models are used to account for heterogeneity in molecular evolution among the sites of DNA sequence alignments. Partitioned models require the user to specify the grouping of sites into subsets, and then assume that each subset of sites can be modeled by a single common process. Mixture models do not require users to prespecify subsets of sites, and instead calculate the likelihood of every site under every model, while co-estimating the model weights and parameters. While much research has gone into the optimization of partitioned models by merging user-specified subsets, there has been less attention paid to the optimization of mixture models for DNA sequence alignments. In this study, we first ask whether a key assumption of partitioned models-that each user-specified subset can be modeled by a single common process-is supported by the data. Having shown that this is not the case, we then design, implement, test, and apply an algorithm, MixtureFinder, to select the optimum number of classes for a mixture model of Q-matrices for the standard models of DNA sequence evolution. We show this algorithm performs well on simulated and empirical datasets and suggest that it may be useful for future empirical studies. MixtureFinder is available in IQ-TREE2, and a tutorial for using MixtureFinder can be found here: http://www.iqtree.org/doc/Complex-Models#mixture-models.","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704958/pdf/","citationCount":"0","resultStr":"{\"title\":\"MixtureFinder: Estimating DNA Mixture Models for Phylogenetic Analyses.\",\"authors\":\"Huaiyan Ren, Thomas K F Wong, Bui Quang Minh, Robert Lanfear\",\"doi\":\"10.1093/molbev/msae264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In phylogenetic studies, both partitioned models and mixture models are used to account for heterogeneity in molecular evolution among the sites of DNA sequence alignments. Partitioned models require the user to specify the grouping of sites into subsets, and then assume that each subset of sites can be modeled by a single common process. Mixture models do not require users to prespecify subsets of sites, and instead calculate the likelihood of every site under every model, while co-estimating the model weights and parameters. While much research has gone into the optimization of partitioned models by merging user-specified subsets, there has been less attention paid to the optimization of mixture models for DNA sequence alignments. In this study, we first ask whether a key assumption of partitioned models-that each user-specified subset can be modeled by a single common process-is supported by the data. Having shown that this is not the case, we then design, implement, test, and apply an algorithm, MixtureFinder, to select the optimum number of classes for a mixture model of Q-matrices for the standard models of DNA sequence evolution. We show this algorithm performs well on simulated and empirical datasets and suggest that it may be useful for future empirical studies. MixtureFinder is available in IQ-TREE2, and a tutorial for using MixtureFinder can be found here: http://www.iqtree.org/doc/Complex-Models#mixture-models.\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704958/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msae264\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msae264","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

在系统发育研究中，分区模型和混合模型都被用来解释DNA序列比对位点之间分子进化的异质性。分区模型要求用户指定将站点分组为子集，然后假设站点的每个子集都可以通过单个公共流程建模。混合模型不需要用户预先指定站点子集，而是计算每个站点在每个模型下的可能性，同时共同估计模型权重和参数。虽然许多研究都是通过合并用户指定的子集来优化分割模型，但对DNA序列比对混合模型的优化关注较少。在本研究中，我们首先要问的是，数据是否支持分区模型的一个关键假设——每个用户指定的子集都可以由单个公共过程建模。在证明情况并非如此之后，我们随后设计、实现、测试和应用MixtureFinder算法，为DNA序列进化的标准模型的Q矩阵混合模型选择最佳数量的类。我们表明该算法在模拟和实证数据集上表现良好，并建议它可能对未来的实证研究有用。MixtureFinder在IQ-TREE2中可用，使用MixtureFinder的教程可以在这里找到：http://www.iqtree.org/doc/Complex-Models#mixture-models。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MixtureFinder: Estimating DNA Mixture Models for Phylogenetic Analyses.

In phylogenetic studies, both partitioned models and mixture models are used to account for heterogeneity in molecular evolution among the sites of DNA sequence alignments. Partitioned models require the user to specify the grouping of sites into subsets, and then assume that each subset of sites can be modeled by a single common process. Mixture models do not require users to prespecify subsets of sites, and instead calculate the likelihood of every site under every model, while co-estimating the model weights and parameters. While much research has gone into the optimization of partitioned models by merging user-specified subsets, there has been less attention paid to the optimization of mixture models for DNA sequence alignments. In this study, we first ask whether a key assumption of partitioned models-that each user-specified subset can be modeled by a single common process-is supported by the data. Having shown that this is not the case, we then design, implement, test, and apply an algorithm, MixtureFinder, to select the optimum number of classes for a mixture model of Q-matrices for the standard models of DNA sequence evolution. We show this algorithm performs well on simulated and empirical datasets and suggest that it may be useful for future empirical studies. MixtureFinder is available in IQ-TREE2, and a tutorial for using MixtureFinder can be found here: http://www.iqtree.org/doc/Complex-Models#mixture-models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular biology and evolution 生物-进化生物学

CiteScore

19.70

自引率

3.70%

发文量

257

审稿时长

1 months

期刊介绍： Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.