{"title":"Testing relationships between multiple regional features and biogeographic processes of speciation, extinction, and dispersal","authors":"Sarah K Swiston, Michael J Landis","doi":"10.1093/sysbio/syae062","DOIUrl":"https://doi.org/10.1093/sysbio/syae062","url":null,"abstract":"The spatial and environmental features of regions where clades are evolving are expected to impact biogeographic processes such as speciation, extinction, and dispersal. Any number of regional features (such as elevation, distance, area, etc.) may be directly or indirectly related to these processes. For example, it may be that distances or differences in elevation or both may limit dispersal rates. However, it is difficult to disentangle which features are most strongly related to rates of different processes. Here, we present an extensible Multi-feature Feature-Informed GeoSSE (MultiFIG) model that allows for the simultaneous investigation of any number of regional features. MultiFIG provides a conceptual framework for incorporating large numbers of features of different types, including categorical, quantitative, within-region, and between-region features, along with a mathematical framework for translating those features into biogeographic rates for statistical hypothesis testing. Using traditional Bayesian parameter estimation and reversible-jump Markov chain Monte Carlo, MultiFIG allows for the exploration of models with different numbers and combinations of feature-effect parameters, and generates estimates for the strengths of relationships between each regional feature and core process. We validate this model with a simulation study covering a range of scenarios with different numbers of regions, tree sizes, and feature values. We also demonstrate the application of MultiFIG with an empirical case study of the South American lizard genus Liolaemus, investigating sixteen regional features related to area, distance, and elevation. Our results show two important feature-process relationships: a negative distance/dispersal relationship, and a negative area/extinction relationship. Interestingly, although speciation rates were found to be higher in Andean versus non-Andean regions, the model did not assign significance to Andean- or elevation-related parameters. These results highlight the need to consider multiple regional features in biogeographic hypothesis testing.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"191 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Höhna, Sarah E Lower, Pablo Duchen, Ana Catalán
{"title":"Robustness of Divergence Time Estimation Despite Gene Tree Estimation Error: A Case Study of Fireflies (Coleoptera: Lampyridae)","authors":"Sebastian Höhna, Sarah E Lower, Pablo Duchen, Ana Catalán","doi":"10.1093/sysbio/syae065","DOIUrl":"https://doi.org/10.1093/sysbio/syae065","url":null,"abstract":"Genomic data has become ubiquitous in phylogenomic studies, including divergence time estimation, but provide new challenges. These challenges include, amongst others, biological gene tree discordance, methodological gene tree estimation error, and computational limitations on performing full Bayesian inference under complex models. In this study, we use a recently published firefly (Coleoptera: Lampyridae) anchored hybrid enrichment dataset (AHE; 436 loci for 88 Lampyridae species and 10 outgroup species) as a case study to explore gene tree estimation error and the robustness of divergence time estimation. First, we explored the amount of model violation using posterior predictive simulations because model violations are likely to bias phylogenetic inferences and produce gene tree estimation error. We specifically focused on missing data (either uniformly distributed or systematically) and the distribution of highly variable and conserved sites (either uniformly distributed or clustered). Our assessment of model adequacy showed that standard phylogenetic substitution models are not adequate for any of the 436 AHE loci. We tested if the model violations and alignment errors resulted indeed in gene tree estimation error by comparing the observed gene tree discordance to simulated gene tree discordance under the multispecies coalescent model. Thus, we show that the inferred gene tree discordance is not only due to biological mechanism but primarily due to inference errors. Lastly, we explored if divergence time estimation is robust despite the observed gene tree estimation error. We selected four subsets of the full AHE dataset, concatenated each subset and performed a Bayesian relaxed clock divergence estimation in RevBayes. The estimated divergence times overlapped for all nodes that are shared between the topologies. Thus, divergence time estimation is robust using any well selected data subset as long as the topology inference is robust.","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":"20 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142610475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fábio K Mendes, Remco Bouckaert, Luiz M Carvalho, Alexei J Drummond
{"title":"How to validate a Bayesian evolutionary model.","authors":"Fábio K Mendes, Remco Bouckaert, Luiz M Carvalho, Alexei J Drummond","doi":"10.1093/sysbio/syae064","DOIUrl":"10.1093/sysbio/syae064","url":null,"abstract":"<p><p>Biology has become a highly mathematical discipline in which probabilistic models play a central role. As a result, research in the biological sciences is now dependent on computational tools capable of carrying out complex analyses. These tools must be validated before they can be used, but what is understood as validation varies widely among methodological contributions. This may be a consequence of the still embryonic stage of the literature on statistical software validation for computational biology. Our manuscript aims to advance this literature. Here, we describe, illustrate and introduce new good practices for assessing the correctness of a model implementation, with an emphasis on Bayesian methods. We also introduce a suite of functionalities for automating validation protocols. It is our hope that the guidelines presented here help sharpen the focus of discussions on (as well as elevate) expected standards of statistical software for biology.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alison R Irwin, Nicholas W Roberts, Ellen E Strong, Yasunori Kano, Daniel I Speiser, Elizabeth M Harper, Suzanne T Williams
{"title":"Evolution of Large Eyes in Stromboidea (Gastropoda): Impact of Photic Environment and Life History Traits.","authors":"Alison R Irwin, Nicholas W Roberts, Ellen E Strong, Yasunori Kano, Daniel I Speiser, Elizabeth M Harper, Suzanne T Williams","doi":"10.1093/sysbio/syae063","DOIUrl":"https://doi.org/10.1093/sysbio/syae063","url":null,"abstract":"<p><p>Eyes within the marine gastropod superfamily Stromboidea range widely in size, from 0.2 to 2.3 mm - the largest eyes known in any gastropod. Despite this interesting variation, the underlying evolutionary pressures remain unknown. Here, we use the wealth of material available in museum collections to explore the evolution of stromboid eye size and structure. Our results suggest that depth is a key light-limiting factor in stromboid eye evolution; here, increasing water depth is correlated with increasing aperture width relative to lens diameter, and therefore an increasing investment in sensitivity in dim light environments. In the major clade containing all large-eyed stromboid families, species observed active during the day and the night had wider eye apertures relative to lens sizes than species observed active during the day only, thereby prioritising sensitivity over resolution. Species with no consistent diel activity pattern also had smaller body sizes than exclusively day-active species, which may suggest that smaller animals are more vulnerable to shell-crushing predators, and avoid the higher predation pressure experienced by animals active during the day. Within the same major clade, ancestral state reconstruction suggests that absolute eye size increased above 1 mm twice. The unresolved position of Varicospira, however, weakens this hypothesis and further work with additional markers is needed to confirm this result.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142584383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shifang Mo, Yaowei Zhu, Mariana P Braga, David J Lohman, Sören Nylin, Ashraf Moumou, Christopher W Wheat, Niklas Wahlberg, Min Wang, Fangzhou Ma, Peng Zhang, Houshuai Wang
{"title":"Rapid Evolution of Host Repertoire and Geographic Range in a Young and Diverse Genus of Montane Butterflies.","authors":"Shifang Mo, Yaowei Zhu, Mariana P Braga, David J Lohman, Sören Nylin, Ashraf Moumou, Christopher W Wheat, Niklas Wahlberg, Min Wang, Fangzhou Ma, Peng Zhang, Houshuai Wang","doi":"10.1093/sysbio/syae061","DOIUrl":"https://doi.org/10.1093/sysbio/syae061","url":null,"abstract":"<p><p>Evolutionary changes in geographic distribution and larval host plants may promote the rapid diversification of montane insects, but this scenario has been rarely investigated. We studied rapid radiation of the butterfly genus Colias, which has diversified in mountain ecosystems in Eurasia, Africa, and the Americas. Based on a dataset of 150 nuclear protein-coding genetic loci and mitochondrial genomes, we constructed a time-calibrated phylogenetic tree of Colias species with broad taxon sampling. We then inferred their ancestral geographic ranges, historical diversification rates, and the evolution of host use. We found that the most recent common ancestor of Colias was likely geographically widespread and originated ~3.5 Ma. The group subsequently diversified in different regions across the world, often in tandem with geographic expansion events. No aspect of elevation was found to have a direct effect on diversification. The genus underwent a burst of diversification soon after the divergence of the Neotropical lineage, followed by an exponential decline in diversification rate toward the present. The ancestral host repertoire included the legume genera Astragalus and Trifolium but later expanded to include a wide range of Fabaceae genera and plants in more distantly related families, punctuated with periods of host range expansion and contraction. We suggest that the widespread distribution of the ancestor of all extant Colias lineages set the stage for diversification by isolation of populations that locally adapted to the various different environments they encountered, including different host plants. In this scenario, elevation is not the main driver but might have accelerated diversification by isolating populations.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":""},"PeriodicalIF":6.1,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142558837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin M Titus, H Lisle Gibbs, Nuno Simões, Marymegan Daly
{"title":"Topology Testing and Demographic Modeling Illuminate a Novel Speciation Pathway in the Greater Caribbean Sea Following the Formation of the Isthmus of Panama.","authors":"Benjamin M Titus, H Lisle Gibbs, Nuno Simões, Marymegan Daly","doi":"10.1093/sysbio/syae045","DOIUrl":"10.1093/sysbio/syae045","url":null,"abstract":"<p><p>Recent genomic analyses have highlighted the prevalence of speciation with gene flow in many taxa and have underscored the importance of accounting for these reticulate evolutionary processes when constructing species trees and generating parameter estimates. This is especially important for deepening our understanding of speciation in the sea where fast-moving ocean currents, expanses of deep water, and periodic episodes of sea level rise and fall act as soft and temporary allopatric barriers that facilitate both divergence and secondary contact. Under these conditions, gene flow is not expected to cease completely while contemporary distributions are expected to differ from historical ones. Here, we conduct range-wide sampling for Pederson's cleaner shrimp (Ancylomenes pedersoni), a species complex from the Greater Caribbean that contains three clearly delimited mitochondrial lineages with both allopatric and sympatric distributions. Using mtDNA barcodes and a genomic ddRADseq approach, we combine classic phylogenetic analyses with extensive topology testing and demographic modeling (10 site frequency replicates × 45 evolutionary models × 50 model simulations/replicate = 22,500 simulations) to test species boundaries and reconstruct the evolutionary history of what was expected to be a simple case study. Instead, our results indicate a history of allopatric divergence, secondary contact, introgression, and endemic hybrid speciation that we hypothesize was driven by the final closure of the Isthmus of Panama and the strengthening of the Gulf Stream Current ~3.5 Ma. The history of this species complex recovered by model-based methods that allow reticulation differs from that recovered by standard phylogenetic analyses and is unexpected given contemporary distributions. The geologically and biologically meaningful insights gained by our model selection analyses illuminate what is likely a novel pathway of species formation not previously documented that resulted from one of the most biogeographically significant events in Earth's history.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"758-768"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141749074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Basanta Khakurel, Courtney Grigsby, Tyler D Tran, Juned Zariwala, Sebastian Höhna, April M Wright
{"title":"The Fundamental Role of Character Coding in Bayesian Morphological Phylogenetics.","authors":"Basanta Khakurel, Courtney Grigsby, Tyler D Tran, Juned Zariwala, Sebastian Höhna, April M Wright","doi":"10.1093/sysbio/syae033","DOIUrl":"10.1093/sysbio/syae033","url":null,"abstract":"<p><p>Phylogenetic trees establish a historical context for the study of organismal form and function. Most phylogenetic trees are estimated using a model of evolution. For molecular data, modeling evolution is often based on biochemical observations about changes between character states. For example, there are 4 nucleotides, and we can make assumptions about the probability of transitions between them. By contrast, for morphological characters, we may not know a priori how many characters states there are per character, as both extant sampling and the fossil record may be highly incomplete, which leads to an observer bias. For a given character, the state space may be larger than what has been observed in the sample of taxa collected by the researcher. In this case, how many evolutionary rates are needed to even describe transitions between morphological character states may not be clear, potentially leading to model misspecification. To explore the impact of this model misspecification, we simulated character data with varying numbers of character states per character. We then used the data to estimate phylogenetic trees using models of evolution with the correct number of character states and an incorrect number of character states. The results of this study indicate that this observer bias may lead to phylogenetic error, particularly in the branch lengths of trees. If the state space is wrongly assumed to be too large, then we underestimate the branch lengths, and the opposite occurs when the state space is wrongly assumed to be too small.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"861-871"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141535331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Danielle K Herrig, Ryan D Ridenbaugh, Kim L Vertacnik, Kathryn M Everson, Sheina B Sim, Scott M Geib, David W Weisrock, Catherine R Linnen
{"title":"Whole Genomes Reveal Evolutionary Relationships and Mechanisms Underlying Gene-Tree Discordance in Neodiprion Sawflies.","authors":"Danielle K Herrig, Ryan D Ridenbaugh, Kim L Vertacnik, Kathryn M Everson, Sheina B Sim, Scott M Geib, David W Weisrock, Catherine R Linnen","doi":"10.1093/sysbio/syae036","DOIUrl":"10.1093/sysbio/syae036","url":null,"abstract":"<p><p>Rapidly evolving taxa are excellent models for understanding the mechanisms that give rise to biodiversity. However, developing an accurate historical framework for comparative analysis of such lineages remains a challenge due to ubiquitous incomplete lineage sorting (ILS) and introgression. Here, we use a whole-genome alignment, multiple locus-sampling strategies, and summary-tree and single nucleotide polymorphism-based species-tree methods to infer a species tree for eastern North American Neodiprion species, a clade of pine-feeding sawflies (Order: Hymenopteran; Family: Diprionidae). We recovered a well-supported species tree that-except for three uncertain relationships-was robust to different strategies for analyzing whole-genome data. Nevertheless, underlying gene-tree discordance was high. To understand this genealogical variation, we used multiple linear regression to model site concordance factors estimated in 50-kb windows as a function of several genomic predictor variables. We found that site concordance factors tended to be higher in regions of the genome with more parsimony-informative sites, fewer singletons, less missing data, lower GC content, more genes, lower recombination rates, and lower D-statistics (less introgression). Together, these results suggest that ILS, introgression, and genotyping error all shape the genomic landscape of gene-tree discordance in Neodiprion. More generally, our findings demonstrate how combining phylogenomic analysis with knowledge of local genomic features can reveal mechanisms that produce topological heterogeneity across genomes.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"839-860"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141545293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniele Silvestro, Thibault Latrille, Nicolas Salamin
{"title":"Toward a Semi-Supervised Learning Approach to Phylogenetic Estimation.","authors":"Daniele Silvestro, Thibault Latrille, Nicolas Salamin","doi":"10.1093/sysbio/syae029","DOIUrl":"10.1093/sysbio/syae029","url":null,"abstract":"<p><p>Models have always been central to inferring molecular evolution and to reconstructing phylogenetic trees. Their use typically involves the development of a mechanistic framework reflecting our understanding of the underlying biological processes, such as nucleotide substitutions, and the estimation of model parameters by maximum likelihood or Bayesian inference. However, deriving and optimizing the likelihood of the data is not always possible under complex evolutionary scenarios or even tractable for large datasets, often leading to unrealistic simplifying assumptions in the fitted models. To overcome this issue, we coupled stochastic simulations of genome evolution with a new supervised deep-learning model to infer key parameters of molecular evolution. Our model is designed to directly analyze multiple sequence alignments and estimate per-site evolutionary rates and divergence without requiring a known phylogenetic tree. The accuracy of our predictions matched that of likelihood-based phylogenetic inference when rate heterogeneity followed a simple gamma distribution, but it strongly exceeded it under more complex patterns of rate variation, such as codon models. Our approach is highly scalable and can be efficiently applied to genomic data, as we showed on a dataset of 26 million nucleotides from the clownfish clade. Our simulations also showed that the integration of per-site rates obtained by deep learning within a Bayesian framework led to significantly more accurate phylogenetic inference, particularly with respect to the estimated branch lengths. We thus propose that future advancements in phylogenetic analysis will benefit from a semi-supervised learning approach that combines deep-learning estimation of substitution rates, which allows for more flexible models of rate variation, and probabilistic inference of the phylogenetic tree, which guarantees interpretability and a rigorous assessment of statistical support.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"789-806"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expectation-Maximization enables Phylogenetic Dating under a Categorical Rate Model.","authors":"Uyen Mai, Eduardo Charvel, Siavash Mirarab","doi":"10.1093/sysbio/syae034","DOIUrl":"10.1093/sysbio/syae034","url":null,"abstract":"<p><p>Dating phylogenetic trees to obtain branch lengths in time units is essential for many downstream applications but has remained challenging. Dating requires inferring substitution rates that can change across the tree. While we can assume to have information about a small subset of nodes from the fossil record or sampling times (for fast-evolving organisms), inferring the ages of the other nodes essentially requires extrapolation and interpolation. Assuming a distribution of branch rates, we can formulate dating as a constrained maximum likelihood (ML) estimation problem. While ML dating methods exist, their accuracy degrades in the face of model misspecification, where the assumed parametric statistical distribution of branch rates vastly differs from the true distribution. Notably, most existing methods assume rigid, often unimodal, branch rate distributions. A second challenge is that the likelihood function involves an integral over the continuous domain of the rates, often leading to difficult non-convex optimization problems. To tackle both challenges, we propose a new method called Molecular Dating using Categorical-models (MD-Cat). MD-Cat uses a categorical model of rates inspired by non-parametric statistics and can approximate a large family of models by discretizing the rate distribution into k categories. Under this model, we can use the Expectation-Maximization algorithm to co-estimate rate categories and branch lengths in time units. Our model has fewer assumptions about the true distribution of branch rates than parametric models such as Gamma or LogNormal distribution. Our results on two simulated and real datasets of Angiosperms and HIV and a wide selection of rate distributions show that MD-Cat is often more accurate than the alternatives, especially on datasets with exponential or multimodal rate distributions.</p>","PeriodicalId":22120,"journal":{"name":"Systematic Biology","volume":" ","pages":"823-838"},"PeriodicalIF":6.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11524793/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141545291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}