Inference of Phylogenetic Networks from Sequence Data using Composite Likelihood.

IF 6.1 1区 生物学 Q1 EVOLUTIONARY BIOLOGY
Sungsik Kong, David L Swofford, Laura S Kubatko
{"title":"Inference of Phylogenetic Networks from Sequence Data using Composite Likelihood.","authors":"Sungsik Kong, David L Swofford, Laura S Kubatko","doi":"10.1093/sysbio/syae054","DOIUrl":null,"url":null,"abstract":"<p><p>While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between two species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing two branches to merge into one, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than two existing composite likelihood summary methods (SNaQ and PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systematic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/sysbio/syae054","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between two species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing two branches to merge into one, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than two existing composite likelihood summary methods (SNaQ and PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.

利用复合似然法从序列数据推断系统发育网络
虽然系统发育对理解物种如何进化至关重要,但系统发育并不能充分描述某些进化过程。例如,杂交是两个物种杂交导致形成新物种的常见现象,必须用系统发育网络来描述,这种结构通过允许两个分支合并成一个分支来修改系统发育树,从而形成网状结构。然而,随着数据集规模和/或拓扑复杂性的增加,现有的网络估算方法计算成本变得非常昂贵。尽管越来越多的证据表明杂交在自然界中经常发生,但缺乏可扩展的推断方法阻碍了系统发生网络在实践中的广泛应用。在这里,我们提出了一种名为 PhyNEST(使用 SiTe 模式的系统发生网络估算)的新方法,它可以直接从序列数据中估算出具有固定的、用户指定的网状结构数量的二元一级系统发生网络。通过使用复合似然作为推断的基础,PhyNEST 能够以计算简单的方式使用完整的基因组数据,而无需在网络估算之前将数据归纳为一组基因树。为了搜索网络空间,PhyNEST 采用了爬山算法和模拟退火算法。PhyNEST 假设数据由独立的聚合位点组成,这些位点根据 Jukes-Cantor 替换模型进化,网络的有效种群规模恒定。模拟研究表明,PhyNEST 往往比现有的两种复合似然总结方法(SNaQ 和 PhyloNet)更准确,而且它至少对一种形式的模型错误规范(假设核苷酸替换模型没有真正的生成模型那么复杂)具有鲁棒性。我们应用 PhyNEST 重建了 Heliconius 蝴蝶和 Papionini 灵长类动物之间的进化关系,这两种动物分别具有杂交物种和大范围内生的特点。PhyNEST 是在开源的 Julia 软件包中实现的,可在 https://github.com/sungsik-kong/PhyNEST.jl 上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Systematic Biology
Systematic Biology 生物-进化生物学
CiteScore
13.00
自引率
7.70%
发文量
70
审稿时长
6-12 weeks
期刊介绍: Systematic Biology is the bimonthly journal of the Society of Systematic Biologists. Papers for the journal are original contributions to the theory, principles, and methods of systematics as well as phylogeny, evolution, morphology, biogeography, paleontology, genetics, and the classification of all living things. A Points of View section offers a forum for discussion, while book reviews and announcements of general interest are also featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信