Kate Truman, Timothy G Vaughan, Alex Gavryushkin, Alexandra Sasha Gavryushkina
{"title":"The Fossilised Birth-Death Model is Identifiable.","authors":"Kate Truman, Timothy G Vaughan, Alex Gavryushkin, Alexandra Sasha Gavryushkina","doi":"10.1093/sysbio/syae058","DOIUrl":"10.1093/sysbio/syae058","url":null,"abstract":"<p><p>Time-dependent birth-death sampling models have been used in numerous studies for inferring past evolutionary dynamics in different biological contexts, e.g. speciation and extinction rates in macroevolutionary studies, or effective reproductive number in epidemiological studies. These models are branching processes where lineages can bifurcate, die, or be sampled with time-dependent birth, death, and sampling rates, generating phylogenetic trees. It has been shown that in some subclasses of such models, different sets of rates can result in the same distributions of reconstructed phylogenetic trees, and therefore the rates become unidentifiable from the trees regardless of their size. Here we show that widely used time-dependent fossilised birth-death (FBD) models are identifiable. This subclass of models makes more realistic assumptions about the fossilisation process and certain infectious disease transmission processes than the unidentifiable birth-death sampling models. Namely, FBD models assume that sampled lineages stay in the process rather than being immediately removed upon sampling. Identifiability of the time-dependent FBD model justifies using statistical methods that implement this model to infer the underlying temporal diversification or epidemiological dynamics from phylogenetic trees or directly from molecular or other comparative data. We further show that the time-dependent fossilised-birth-death model with an extra parameter, the removal after sampling probability, is unidentifiable. This implies that in scenarios where we do not know how sampling affects lineages we are unable to infer this extra parameter together with birth, death, and sampling rates solely from trees.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142475252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin S Toups, Robert C Thomson, Jeremy M Brown
{"title":"Complex Models of Sequence Evolution Improve Fit, but not Gene Tree Discordance, for Tetrapod Mitogenomes.","authors":"Benjamin S Toups, Robert C Thomson, Jeremy M Brown","doi":"10.1093/sysbio/syae056","DOIUrl":"https://doi.org/10.1093/sysbio/syae056","url":null,"abstract":"<p><p>Variation in gene tree estimates is widely observed in empirical phylogenomic data and is often assumed to be the result of biological processes. However, a recent study using tetrapod mitochondrial genomes to control for biological sources of variation due to their haploid, uniparentally inherited, and non-recombining nature found that levels of discordance among mitochondrial gene trees were comparable to those found in studies that assume only biological sources of variation. Additionally, they found that several of the models of sequence evolution chosen to infer gene trees were doing an inadequate job fitting the sequence data. These results indicated that significant amounts of gene tree discordance in empirical data may be due to poor fit of sequence evolution models, and that more complex and biologically realistic models may be needed. To test how the fit of sequence evolution models relates to gene tree discordance, we analyzed the same mitochondrial datasets as the previous study using two additional, more complex models of sequence evolution that each includes a different biologically realistic aspect of the evolutionary process: a covarion model to incorporate site-specific rate variation across lineages (heterotachy), and a partitioned model to incorporate variable evolutionary patterns by codon position. Our results show that both additional models fit the data better than the models used in the previous study, with the covarion being consistently and strongly preferred as tree size increases. However, even these more preferred models still inferred highly discordant mitochondrial gene trees, thus deepening the mystery around what we label the \"Mito-Phylo Paradox\" and leading us to ask whether the observed variation could, in fact, be biological in nature after all.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142406814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inference of Phylogenetic Networks from Sequence Data using Composite Likelihood.","authors":"Sungsik Kong, David L Swofford, Laura S Kubatko","doi":"10.1093/sysbio/syae054","DOIUrl":"https://doi.org/10.1093/sysbio/syae054","url":null,"abstract":"<p><p>While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between two species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing two branches to merge into one, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than two existing composite likelihood summary methods (SNaQ and PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142401387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura P A Mulvey, Michael R May, Jeremy M Brown, Sebastian Höhna, April M Wright, Rachel C M Warnock
{"title":"Assessing the Adequacy of Morphological Models using Posterior Predictive Simulations","authors":"Laura P A Mulvey, Michael R May, Jeremy M Brown, Sebastian Höhna, April M Wright, Rachel C M Warnock","doi":"10.1093/sysbio/syae055","DOIUrl":"https://doi.org/10.1093/sysbio/syae055","url":null,"abstract":"Reconstructing the evolutionary history of different groups of organisms provides insight into how life originated and diversified on Earth. Phylogenetic trees are commonly used to estimate this evolutionary history. Within Bayesian phylogenetics a major step in estimating a tree is in choosing an appropriate model of character evolution. While the most common character data used is molecular sequence data, morphological data remains a vital source of information. The use of morphological characters allows for the incorporation fossil taxa, and despite advances in molecular sequencing, continues to play a significant role in neontology. Moreover, it is the main data source that allows us to unite extinct and extant taxa directly under the same generating process. We therefore require suitable models of morphological character evolution, the most common being the Mk Lewis model. While it is frequently used in both palaeobiology and neontology, it is not known whether the simple Mk substitution model, or any extensions to it, provide a sufficiently good description of the process of morphological evolution. In this study we investigate the impact of different morphological models on empirical tetrapod data sets. Specifically, we compare unpartitioned Mk models with those where characters are partitioned by the number of observed states, both with and without allowing for rate variation across sites and accounting for ascertainment bias. We show that the choice of substitution model has an impact on both topology and branch lengths, highlighting the importance of model choice. Through simulations, we validate the use of the model adequacy approach, posterior predictive simulations, for choosing an appropriate model. Additionally, we compare the performance of model adequacy with Bayesian model selection. We demonstrate how model selection approaches based on marginal likelihoods are not appropriate for choosing between models with partition schemes that vary in character state space (i.e., that vary in Q-matrix state size). Using posterior predictive simulations, we found that current variations of the Mk model are often performing adequately in capturing the evolutionary dynamics that generated our data. We do not find any preference for a particular model extension across multiple data sets, indicating that there is no ‘one size fits all’ when it comes to morphological data and that careful consideration should be given to choosing models of discrete character evolution. By using suitable models of character evolution, we can increase our confidence in our phylogenetic estimates, which should in turn allow us to gain more accurate insights into the evolutionary history of both extinct and extant taxa.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"54 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142384288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Phylogenomics of Bivalvia Using Ultraconserved Elements (UCEs) Reveal New Topologies for Pteriomorphia and Imparidentia.","authors":"Yi-Xuan Li, Jack Chi-Ho Ip, Chong Chen, Ting Xu, Qian Zhang, Yanan Sun, Pei-Zhen Ma, Jian-Wen Qiu","doi":"10.1093/sysbio/syae052","DOIUrl":"https://doi.org/10.1093/sysbio/syae052","url":null,"abstract":"<p><p>Despite significant advances in phylogenetics over the past decades, the deep relationships within Bivalvia (phylum Mollusca) remain inconclusive. Previous efforts based on morphology or several genes have failed to resolve many key nodes in the phylogeny of Bivalvia. Advances have been made recently using transcriptome data, but the phylogenetic relationships within Bivalvia historically lacked consensus, especially within Pteriomorphia and Imparidentia. Here, we inferred the relationships of key lineages within Bivalvia using matrices generated from specifically designed ultraconserved elements (UCEs) with 16 available genomic resources and 85 newly sequenced specimens from 55 families. Our new probes (Bivalve UCE 2k v.1) for target sequencing captured an average of 849 UCEs with 1085-bp in mean length from in vitro experiments. Our results introduced novel schemes from six major clades (Protobranchina, Pteriomorphia, Palaeoheterodonta, Archiheterodonta, Anomalodesmata and Imparidentia), though some inner nodes were poorly resolved, such as paraphyletic Heterodonta in some topologies potentially due to insufficient taxon sampling. The resolution increased when analyzing specific matrices for Pteriomorphia and Imparidentia. We recovered three Pteriomorphia topologies different from previously published trees, with the strongest support for ((Ostreida + (Arcida + Mytilida)) + (Pectinida + (Limida + Pectinida))). Limida were nested within Pectinida, warranting further studies. For Imparidentia, our results strongly supported the new hypothesis of (Galeommatida + (Adapedonta + Cardiida)), while the possible non-monophyly of Lucinida was inferred but poorly supported. Overall, our results provide important insights into the phylogeny of Bivalvia and show that target enrichment sequencing of UCEs can be broadly applied to study both deep and shallow phylogenetic relationships.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142295902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian P Waldron, Emily F Watts, Donald J Morgan, Maggie M Hantak, Alan R Lemmon, Emily Moriarty Lemmon, Shawn R Kuchta
{"title":"The limits of the metapopulation: Lineage fragmentation in a widespread terrestrial salamander (Plethodon cinereus)","authors":"Brian P Waldron, Emily F Watts, Donald J Morgan, Maggie M Hantak, Alan R Lemmon, Emily Moriarty Lemmon, Shawn R Kuchta","doi":"10.1093/sysbio/syae053","DOIUrl":"https://doi.org/10.1093/sysbio/syae053","url":null,"abstract":"In vicariant species formation, divergence results primarily from periods of allopatry and restricted gene flow. Widespread species harboring differentiated, geographically distinct sublineages offer a window into what may be a common mode of species formation, whereby a species originates, spreads across the landscape, then fragments into multiple units. However, incipient lineages usually lack reproductive barriers that prevent their fusion upon secondary contact, blurring the boundaries between a single, large metapopulation-level lineage and multiple independent species. Here we explore this model of species formation in the Eastern Red-backed Salamander (Plethodon cinereus), a widespread terrestrial vertebrate with at least six divergent mitochondrial clades throughout its range. Using anchored hybrid enrichment data, we applied phylogenomic and population genomic approaches to investigate patterns of divergence, gene flow, and secondary contact. Genomic data broadly match most mitochondrial groups but reveal mitochondrial introgression and extensive admixture at several contact zones. While species delimitation analyses in BPP supported five lineages of P. cinereus, genealogical divergence indices (gdi) were highly sensitive to the inclusion of admixed samples and the geographic representation of candidate species, with increasing support for multiple species when removing admixed samples or limiting sampling to a single locality per group. An analysis of morphometric data revealed differences in body size and limb proportions among groups, with a reduction of forelimb length among warmer and drier localities consistent with increased fossoriality. We conclude that P. cinereus is a single species, but one with highly structured component lineages of various degrees of independence.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"63 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142160434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Toby G L Kovacs, James Walker, Simon Hellemans, Thomas Bourguignon, Nikolai J Tatarnic, Jane M McRae, Simon Y W Ho, Nathan Lo
{"title":"Dating in the Dark: Elevated Substitution Rates in Cave Cockroaches (Blattodea: Nocticolidae) Have Negative Impacts on Molecular Date Estimates.","authors":"Toby G L Kovacs, James Walker, Simon Hellemans, Thomas Bourguignon, Nikolai J Tatarnic, Jane M McRae, Simon Y W Ho, Nathan Lo","doi":"10.1093/sysbio/syae002","DOIUrl":"10.1093/sysbio/syae002","url":null,"abstract":"<p><p>Rates of nucleotide substitution vary substantially across the Tree of Life, with potentially confounding effects on phylogenetic and evolutionary analyses. A large acceleration in mitochondrial substitution rate occurs in the cockroach family Nocticolidae, which predominantly inhabit subterranean environments. To evaluate the impacts of this among-lineage rate heterogeneity on estimates of phylogenetic relationships and evolutionary timescales, we analyzed nuclear ultraconserved elements (UCEs) and mitochondrial genomes from nocticolids and other cockroaches. Substitution rates were substantially elevated in nocticolid lineages compared with other cockroaches, especially in mitochondrial protein-coding genes. This disparity in evolutionary rates is likely to have led to different evolutionary relationships being supported by phylogenetic analyses of mitochondrial genomes and UCE loci. Furthermore, Bayesian dating analyses using relaxed-clock models inferred much deeper divergence times compared with a flexible local clock. Our phylogenetic analysis of UCEs, which is the first genome-scale study to include all 13 major cockroach families, unites Corydiidae and Nocticolidae and places Anaplectidae as the sister lineage to the rest of Blattoidea. We uncover an extraordinary level of genetic divergence in Nocticolidae, including two highly distinct clades that separated ~115 million years ago despite both containing representatives of the genus Nocticola. The results of our study highlight the potential impacts of high among-lineage rate variation on estimates of phylogenetic relationships and evolutionary timescales.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"532-545"},"PeriodicalIF":6.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11377191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139698361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander E Fedosov, Paul Zaharias, Thomas Lemarcis, Maria Vittoria Modica, Mandë Holford, Marco Oliverio, Yuri I Kantor, Nicolas Puillandre
{"title":"Phylogenomics of Neogastropoda: The Backbone Hidden in the Bush.","authors":"Alexander E Fedosov, Paul Zaharias, Thomas Lemarcis, Maria Vittoria Modica, Mandë Holford, Marco Oliverio, Yuri I Kantor, Nicolas Puillandre","doi":"10.1093/sysbio/syae010","DOIUrl":"10.1093/sysbio/syae010","url":null,"abstract":"<p><p>The molluskan order Neogastropoda encompasses over 15,000 almost exclusively marine species playing important roles in benthic communities and in the economies of coastal countries. Neogastropoda underwent intensive cladogenesis in the early stages of diversification, generating a \"bush\" at the base of their evolutionary tree, which has been hard to resolve even with high throughput molecular data. In the present study to resolve the bush, we use a variety of phylogenetic inference methods and a comprehensive exon capture dataset of 1817 loci (79.6% data occupancy) comprising 112 taxa of 48 out of 60 Neogastropoda families. Our results show consistent topologies and high support in all analyses at (super)family level, supporting monophyly of Muricoidea, Mitroidea, Conoidea, and, with some reservations, Olivoidea and Buccinoidea. Volutoidea and Turbinelloidea as currently circumscribed are clearly paraphyletic. Despite our analyses consistently resolving most backbone nodes, 3 prove problematic: First, the uncertain placement of Cancellariidae, as the sister group to either a Ficoidea-Tonnoidea clade or to the rest of Neogastropoda, leaves monophyly of Neogastropoda unresolved. Second, relationships are contradictory at the base of the major \"core Neogastropoda\" grouping. Third, coalescence-based analyses reject monophyly of the Buccinoidea in relation to Vasidae. We analyzed phylogenetic signal of targeted loci in relation to potential biases, and we propose the most probable resolutions in the latter 2 recalcitrant nodes. The uncertain placement of Cancellariidae may be explained by orthology violations due to differential paralog loss shortly after the whole genome duplication, which should be resolved with a curated set of longer loci.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"521-531"},"PeriodicalIF":6.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11377187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140060479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gustavo S de Miranda, Siddharth S Kulkarni, Jéssica Tagliatela, Caitlin M Baker, Alessandro P L Giupponi, Facundo M Labarque, Efrat Gavish-Regev, Michael G Rix, Leonardo S Carvalho, Lívia Maria Fusari, Mark S Harvey, Hannah M Wood, Prashant P Sharma
{"title":"The Rediscovery of a Relict Unlocks the First Global Phylogeny of Whip Spiders (Amblypygi).","authors":"Gustavo S de Miranda, Siddharth S Kulkarni, Jéssica Tagliatela, Caitlin M Baker, Alessandro P L Giupponi, Facundo M Labarque, Efrat Gavish-Regev, Michael G Rix, Leonardo S Carvalho, Lívia Maria Fusari, Mark S Harvey, Hannah M Wood, Prashant P Sharma","doi":"10.1093/sysbio/syae021","DOIUrl":"10.1093/sysbio/syae021","url":null,"abstract":"<p><p>Asymmetrical rates of cladogenesis and extinction abound in the tree of life, resulting in numerous minute clades that are dwarfed by larger sister groups. Such taxa are commonly regarded as phylogenetic relicts or \"living fossils\" when they exhibit an ancient first appearance in the fossil record and prolonged external morphological stasis, particularly in comparison to their more diversified sister groups. Due to their special status, various phylogenetic relicts tend to be well-studied and prioritized for conservation. A notable exception to this trend is found within Amblypygi (\"whip spiders\"), a visually striking order of functionally hexapodous arachnids that are notable for their antenniform first walking leg pair (the eponymous \"whips\"). Paleoamblypygi, the putative sister group to the remaining Amblypygi, is known from Late Carboniferous and Eocene deposits but is survived by a single living species, Paracharon caecusHansen (1921), that was last collected in 1899. Due to the absence of genomic sequence-grade tissue for this vital taxon, there is no global molecular phylogeny for Amblypygi to date, nor a fossil-calibrated estimation of divergences within the group. Here, we report a previously unknown species of Paleoamblypygi from a cave site in Colombia. Capitalizing upon this discovery, we generated the first molecular phylogeny of Amblypygi, integrating ultraconserved element sequencing with legacy Sanger datasets and including described extant genera. To quantify the impact of sampling Paleoamblypygi on divergence time estimation, we performed in silico experiments with pruning of Paracharon. We demonstrate that the omission of relicts has a significant impact on the accuracy of node dating approaches that outweighs the impact of excluding ingroup fossils, which bears upon the ancestral range reconstruction for the group. Our results underscore the imperative for biodiversity discovery efforts in elucidating the phylogenetic relationships of \"dark taxa,\" and especially phylogenetic relicts in tropical and subtropical habitats. The lack of reciprocal monophyly for Charontidae and Charinidae leads us to subsume them into one family, Charontidae, new synonymy.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"495-505"},"PeriodicalIF":6.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140908807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew F Magee, Andrew J Holbrook, Jonathan E Pekar, Itzue W Caviedes-Solis, Fredrick A Matsen Iv, Guy Baele, Joel O Wertheim, Xiang Ji, Philippe Lemey, Marc A Suchard
{"title":"Random-Effects Substitution Models for Phylogenetics via Scalable Gradient Approximations.","authors":"Andrew F Magee, Andrew J Holbrook, Jonathan E Pekar, Itzue W Caviedes-Solis, Fredrick A Matsen Iv, Guy Baele, Joel O Wertheim, Xiang Ji, Philippe Lemey, Marc A Suchard","doi":"10.1093/sysbio/syae019","DOIUrl":"10.1093/sysbio/syae019","url":null,"abstract":"<p><p>Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"562-578"},"PeriodicalIF":6.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11498053/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140869958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}