Effective Population Size Estimation in Large Marine Populations: Considering Current Challenges and Opportunities When Simulating Large Data Sets With High-Density Genomic Information
Chrystelle Delord, Sophie Arnaud-Haond, Agostino Leone, Ekaterina Noskova, Rémi Tournebize, Patrick Jacques, Francis Marsac, Natacha Nikolic
{"title":"Effective Population Size Estimation in Large Marine Populations: Considering Current Challenges and Opportunities When Simulating Large Data Sets With High-Density Genomic Information","authors":"Chrystelle Delord, Sophie Arnaud-Haond, Agostino Leone, Ekaterina Noskova, Rémi Tournebize, Patrick Jacques, Francis Marsac, Natacha Nikolic","doi":"10.1111/eva.70121","DOIUrl":null,"url":null,"abstract":"<p>Next-generation-sequencing has broadened perspectives regarding the estimation of the effective population size (<i>Ne</i>) by providing high-density genomic information. These technologies have expanded data collection and analytical tools in population genetics, increasing understanding of populations with high abundance, such as marine species with high commercial or conservation priority. Several common methods for estimating <i>Ne</i> are based on allele frequency spectra or linkage disequilibrium between loci. However, their specific constraints make it difficult to apply them to large populations, especially with confounding factors such as migration rates, complex sampling schemes or non-independence between loci. Computer simulations have long represented invaluable tools to explore the influence of biological or logistical factors on <i>Ne</i> estimation and to assess the robustness of dedicated methods. Here, we outline several <i>Ne</i> estimation methods and their foundational principles, requirements and likely caveats regarding application to populations of high abundance. Thereafter, we present a simulation framework built upon recent computational genomic tools that combine the possibility to generate biologically realistic data sets with realistic patterns of long-term neutral genetic diversity. This framework aims at reproducing and tracking the main critical features of data derived from a large natural population when running a simulation-based population genetics study, for example, evaluating the strengths and limitations of various <i>Ne</i> estimation methods. We illustrate this framework by generating genotype data sets with varying sample sizes and locus numbers and analysing them with three software tools (NeEstimator2, GONE and GADMA). Detailed and annotated simulation scripts are provided to ensure reproducibility and to support future research on <i>Ne</i> estimation. These resources can support method comparisons and validations, particularly for non-specialists, such as conservation practitioners and students.</p>","PeriodicalId":168,"journal":{"name":"Evolutionary Applications","volume":"18 8","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/eva.70121","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Applications","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/eva.70121","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Next-generation-sequencing has broadened perspectives regarding the estimation of the effective population size (Ne) by providing high-density genomic information. These technologies have expanded data collection and analytical tools in population genetics, increasing understanding of populations with high abundance, such as marine species with high commercial or conservation priority. Several common methods for estimating Ne are based on allele frequency spectra or linkage disequilibrium between loci. However, their specific constraints make it difficult to apply them to large populations, especially with confounding factors such as migration rates, complex sampling schemes or non-independence between loci. Computer simulations have long represented invaluable tools to explore the influence of biological or logistical factors on Ne estimation and to assess the robustness of dedicated methods. Here, we outline several Ne estimation methods and their foundational principles, requirements and likely caveats regarding application to populations of high abundance. Thereafter, we present a simulation framework built upon recent computational genomic tools that combine the possibility to generate biologically realistic data sets with realistic patterns of long-term neutral genetic diversity. This framework aims at reproducing and tracking the main critical features of data derived from a large natural population when running a simulation-based population genetics study, for example, evaluating the strengths and limitations of various Ne estimation methods. We illustrate this framework by generating genotype data sets with varying sample sizes and locus numbers and analysing them with three software tools (NeEstimator2, GONE and GADMA). Detailed and annotated simulation scripts are provided to ensure reproducibility and to support future research on Ne estimation. These resources can support method comparisons and validations, particularly for non-specialists, such as conservation practitioners and students.
期刊介绍:
Evolutionary Applications is a fully peer reviewed open access journal. It publishes papers that utilize concepts from evolutionary biology to address biological questions of health, social and economic relevance. Papers are expected to employ evolutionary concepts or methods to make contributions to areas such as (but not limited to): medicine, agriculture, forestry, exploitation and management (fisheries and wildlife), aquaculture, conservation biology, environmental sciences (including climate change and invasion biology), microbiology, and toxicology. All taxonomic groups are covered from microbes, fungi, plants and animals. In order to better serve the community, we also now strongly encourage submissions of papers making use of modern molecular and genetic methods (population and functional genomics, transcriptomics, proteomics, epigenetics, quantitative genetics, association and linkage mapping) to address important questions in any of these disciplines and in an applied evolutionary framework. Theoretical, empirical, synthesis or perspective papers are welcome.