{"title":"Maximising the Potential of Temporal \n \n \n \n N\n \n e\n \n \n \n Estimation in Long-Term Population Monitoring Programmes","authors":"Tin-Yu J. Hui","doi":"10.1111/1755-0998.14125","DOIUrl":null,"url":null,"abstract":"<p>Effective population size (<span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math>) is indisputably one of the most important parameters in evolutionary biology. It governs the rate of evolution, magnitude of drift, effectiveness of selection, diversity, and many more. It also serves as a key indicator to inform population monitoring programmes, from conservation of endangered species to biocontrol of agricultural pests or disease vectors, and almost everything in between. In applications in which the contemporary <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> is of interest, temporal <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> is one of the most widely used estimators. It measures the magnitude of drift among genetically neutral loci to estimate the harmonic mean <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> between two time points. In this issue, Waples et al. (<span>2025</span>) present us a new software “MAXTEMP” to improve the precision of temporal <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> by incorporating additional samples outside of the focal period.</p><p>I was sceptical at first as these seemingly unrelated samples appear to be uninformative, especially after I had revisited Waples (<span>2005</span>) on the time periods at which the temporal <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> estimates apply. Upon closer inspection, the authors reassured us with an intuitive yet robust argument: consider a population monitoring programme with initially two temporal samples sandwiching the focal period. A direct <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> estimate is obtained via temporal <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> in the traditional way (Waples <span>1989</span>). A third sample is subsequently collected, and the same <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> can also be implied from the difference of the overall <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> and that between the second and third time points. There exists a weighted average (linear combination) of the two that outperforms either estimate by having a lower variance. The idea can be generalised to an arbitrary number of additional samples from which the implied estimates are calculated. The remaining challenge is to find an appropriate weighing system which the authors duly examined.</p><p>From a technical perspective, the direct and implied estimates share the same expectation hence the combined one remains largely unbiased. Without any additional cost (apart from sampling more individuals) MAXTEMP brings the benefit of variance reduction, whose magnitude depends on individual variances as well as their pairwise correlation. The implied estimates often come with much larger variances given they span across longer horizons. As all estimates aim to extract the same drift signal for the focal period positive correlations are unavoidably induced, limiting the potential to shrink the variance further. Despite these mathematical constraints the authors demonstrated a reduction in the standard deviation of <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> by up to 50% in selected cases. In my experience, the correlation weakens when the sample size <span></span><math>\n <semantics>\n <mrow>\n <mi>s</mi>\n </mrow>\n </semantics></math> is limited, as sampling noise colludes with the underlying drift. This is hugely encouraging that the improvement is the greatest when it is most needed. While it is tempting to include as many additional samples as possible, the authors recommended to bring no more than one from either side or information will become saturated. It is also possible to couple temporal <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> with single-sample estimates (e.g., Linkage Disequilibrium among unlinked loci) to resolve mystery <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math>'s of unsampled generations.</p><p>In parallel with temporal <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math>, the maximum likelihood (ML) approaches are being utilised to extract the same drift signal by considering the distributional change in allele frequency over time (Williamson and Slatkin <span>1999</span>; Hui and Burt <span>2015</span>). Both Waples et al. and I discovered that ML also benefits from having additional samples almost in the same way as the moment-based <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math>, with weights being automatically chosen during maximisation. This welcoming effect was not examined or documented in previous publications, although some suggested the possibility to aggregate multigenerational samples for non-constant <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> scenarios. Further investigation into the ML methods is required. Conceptually, MAXTEMP calculates <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math>'s from pairs of samples iteratively while ML jointly considers all data points, hence the idea of refining an earlier estimate does not exist in the latter. Given the lack of provision of software and the non-trivial effort to alter ML algorithms to cater to individual sampling plans, the timely arrival of MAXTEMP fills in the current resource gap as a drop-in alternative to existing tools (e.g., Do et al. <span>2014</span>).</p><p>The latest development of temporal <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> has been shifting towards its application on genomic datasets, in which correlation among densely linked loci reduces the effective amount of information they hold (Hui et al. <span>2021</span>; Waples et al. <span>2022</span>). Even after accounting for pseudo-replication sample size is likely to be the limiting factor for problems like this. The precision of <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> depends on <span></span><math>\n <semantics>\n <mrow>\n <mi>s</mi>\n <mo>/</mo>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math>, meaning sample size cannot be perfectly determined <i>a priori</i>. Putting the issue of sample size aside, the ways in which MAXTEMP enhances <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> estimation also depend on the true size. For small, fragmented populations a certain local <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> has to be maintained to prevent inbreeding depression or mutational meltdown. Having tighter confidence intervals (C.I.) will undoubtably benefit the assessment of genetic health to ensure <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> is always kept above a certain threshold. It also facilitates early detection of population decline as swift action is required to prevent any irreversible loss of genetic variation, or worse, extinction. On the other end of the spectrum where larger <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> is of concern, the point estimate (or lower bound) of <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> can be negative which is interpreted as no drift with infinitely large <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math>. Reducing the variance of <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> will lower the occurrence of yielding negative estimates as a direct consequence, or in other words, increase the chance of having meaningful <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> and C.I. for decision making. It is foreseeable that with the aid of computer simulation the spirit of MAXTEMP will inspire future designs of population monitoring programmes, such as to incorporate samples from historic or pilot studies into the main analyses, or to develop contingency plans with post hoc sampling to rescue a negative <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> estimate. With <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> being recently incorporated as one of the headline indicators by the Convention on Biological Diversity (Thurfjell et al. <span>2022</span>), the type of analyses requiring MAXTEMP is expected to grow. I am confident that the community will bring MAXTEMP to its full potential in years to come.</p><p>The author declares no conflicts of interest.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14125","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.14125","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Effective population size () is indisputably one of the most important parameters in evolutionary biology. It governs the rate of evolution, magnitude of drift, effectiveness of selection, diversity, and many more. It also serves as a key indicator to inform population monitoring programmes, from conservation of endangered species to biocontrol of agricultural pests or disease vectors, and almost everything in between. In applications in which the contemporary is of interest, temporal is one of the most widely used estimators. It measures the magnitude of drift among genetically neutral loci to estimate the harmonic mean between two time points. In this issue, Waples et al. (2025) present us a new software “MAXTEMP” to improve the precision of temporal by incorporating additional samples outside of the focal period.
I was sceptical at first as these seemingly unrelated samples appear to be uninformative, especially after I had revisited Waples (2005) on the time periods at which the temporal estimates apply. Upon closer inspection, the authors reassured us with an intuitive yet robust argument: consider a population monitoring programme with initially two temporal samples sandwiching the focal period. A direct estimate is obtained via temporal in the traditional way (Waples 1989). A third sample is subsequently collected, and the same can also be implied from the difference of the overall and that between the second and third time points. There exists a weighted average (linear combination) of the two that outperforms either estimate by having a lower variance. The idea can be generalised to an arbitrary number of additional samples from which the implied estimates are calculated. The remaining challenge is to find an appropriate weighing system which the authors duly examined.
From a technical perspective, the direct and implied estimates share the same expectation hence the combined one remains largely unbiased. Without any additional cost (apart from sampling more individuals) MAXTEMP brings the benefit of variance reduction, whose magnitude depends on individual variances as well as their pairwise correlation. The implied estimates often come with much larger variances given they span across longer horizons. As all estimates aim to extract the same drift signal for the focal period positive correlations are unavoidably induced, limiting the potential to shrink the variance further. Despite these mathematical constraints the authors demonstrated a reduction in the standard deviation of by up to 50% in selected cases. In my experience, the correlation weakens when the sample size is limited, as sampling noise colludes with the underlying drift. This is hugely encouraging that the improvement is the greatest when it is most needed. While it is tempting to include as many additional samples as possible, the authors recommended to bring no more than one from either side or information will become saturated. It is also possible to couple temporal with single-sample estimates (e.g., Linkage Disequilibrium among unlinked loci) to resolve mystery 's of unsampled generations.
In parallel with temporal , the maximum likelihood (ML) approaches are being utilised to extract the same drift signal by considering the distributional change in allele frequency over time (Williamson and Slatkin 1999; Hui and Burt 2015). Both Waples et al. and I discovered that ML also benefits from having additional samples almost in the same way as the moment-based , with weights being automatically chosen during maximisation. This welcoming effect was not examined or documented in previous publications, although some suggested the possibility to aggregate multigenerational samples for non-constant scenarios. Further investigation into the ML methods is required. Conceptually, MAXTEMP calculates 's from pairs of samples iteratively while ML jointly considers all data points, hence the idea of refining an earlier estimate does not exist in the latter. Given the lack of provision of software and the non-trivial effort to alter ML algorithms to cater to individual sampling plans, the timely arrival of MAXTEMP fills in the current resource gap as a drop-in alternative to existing tools (e.g., Do et al. 2014).
The latest development of temporal has been shifting towards its application on genomic datasets, in which correlation among densely linked loci reduces the effective amount of information they hold (Hui et al. 2021; Waples et al. 2022). Even after accounting for pseudo-replication sample size is likely to be the limiting factor for problems like this. The precision of depends on , meaning sample size cannot be perfectly determined a priori. Putting the issue of sample size aside, the ways in which MAXTEMP enhances estimation also depend on the true size. For small, fragmented populations a certain local has to be maintained to prevent inbreeding depression or mutational meltdown. Having tighter confidence intervals (C.I.) will undoubtably benefit the assessment of genetic health to ensure is always kept above a certain threshold. It also facilitates early detection of population decline as swift action is required to prevent any irreversible loss of genetic variation, or worse, extinction. On the other end of the spectrum where larger is of concern, the point estimate (or lower bound) of can be negative which is interpreted as no drift with infinitely large . Reducing the variance of will lower the occurrence of yielding negative estimates as a direct consequence, or in other words, increase the chance of having meaningful and C.I. for decision making. It is foreseeable that with the aid of computer simulation the spirit of MAXTEMP will inspire future designs of population monitoring programmes, such as to incorporate samples from historic or pilot studies into the main analyses, or to develop contingency plans with post hoc sampling to rescue a negative estimate. With being recently incorporated as one of the headline indicators by the Convention on Biological Diversity (Thurfjell et al. 2022), the type of analyses requiring MAXTEMP is expected to grow. I am confident that the community will bring MAXTEMP to its full potential in years to come.
期刊介绍:
Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines.
In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.