Maximising the Potential of Temporal N e Estimation in Long-Term Population Monitoring Programmes

IF 5.5 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular Ecology Resources Pub Date : 2025-05-23 DOI:10.1111/1755-0998.14125

Tin-Yu J. Hui

{"title":"Maximising the Potential of Temporal \n \n \n \n N\n \n e\n \n \n \n Estimation in Long-Term Population Monitoring Programmes","authors":"Tin-Yu J. Hui","doi":"10.1111/1755-0998.14125","DOIUrl":null,"url":null,"abstract":"Effective population size (<math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math>) is indisputably one of the most important parameters in evolutionary biology. It governs the rate of evolution, magnitude of drift, effectiveness of selection, diversity, and many more. It also serves as a key indicator to inform population monitoring programmes, from conservation of endangered species to biocontrol of agricultural pests or disease vectors, and almost everything in between. In applications in which the contemporary <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> is of interest, temporal <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> is one of the most widely used estimators. It measures the magnitude of drift among genetically neutral loci to estimate the harmonic mean <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> between two time points. In this issue, Waples et al. (2025) present us a new software “MAXTEMP” to improve the precision of temporal <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> by incorporating additional samples outside of the focal period.I was sceptical at first as these seemingly unrelated samples appear to be uninformative, especially after I had revisited Waples (2005) on the time periods at which the temporal <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> estimates apply. Upon closer inspection, the authors reassured us with an intuitive yet robust argument: consider a population monitoring programme with initially two temporal samples sandwiching the focal period. A direct <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> estimate is obtained via temporal <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> in the traditional way (Waples 1989). A third sample is subsequently collected, and the same <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> can also be implied from the difference of the overall <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> and that between the second and third time points. There exists a weighted average (linear combination) of the two that outperforms either estimate by having a lower variance. The idea can be generalised to an arbitrary number of additional samples from which the implied estimates are calculated. The remaining challenge is to find an appropriate weighing system which the authors duly examined.From a technical perspective, the direct and implied estimates share the same expectation hence the combined one remains largely unbiased. Without any additional cost (apart from sampling more individuals) MAXTEMP brings the benefit of variance reduction, whose magnitude depends on individual variances as well as their pairwise correlation. The implied estimates often come with much larger variances given they span across longer horizons. As all estimates aim to extract the same drift signal for the focal period positive correlations are unavoidably induced, limiting the potential to shrink the variance further. Despite these mathematical constraints the authors demonstrated a reduction in the standard deviation of <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> by up to 50% in selected cases. In my experience, the correlation weakens when the sample size <math>\n <semantics>\n <mrow>\n <mi>s</mi>\n </mrow>\n </semantics></math> is limited, as sampling noise colludes with the underlying drift. This is hugely encouraging that the improvement is the greatest when it is most needed. While it is tempting to include as many additional samples as possible, the authors recommended to bring no more than one from either side or information will become saturated. It is also possible to couple temporal <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> with single-sample estimates (e.g., Linkage Disequilibrium among unlinked loci) to resolve mystery <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math>'s of unsampled generations.In parallel with temporal <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math>, the maximum likelihood (ML) approaches are being utilised to extract the same drift signal by considering the distributional change in allele frequency over time (Williamson and Slatkin 1999; Hui and Burt 2015). Both Waples et al. and I discovered that ML also benefits from having additional samples almost in the same way as the moment-based <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math>, with weights being automatically chosen during maximisation. This welcoming effect was not examined or documented in previous publications, although some suggested the possibility to aggregate multigenerational samples for non-constant <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> scenarios. Further investigation into the ML methods is required. Conceptually, MAXTEMP calculates <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math>'s from pairs of samples iteratively while ML jointly considers all data points, hence the idea of refining an earlier estimate does not exist in the latter. Given the lack of provision of software and the non-trivial effort to alter ML algorithms to cater to individual sampling plans, the timely arrival of MAXTEMP fills in the current resource gap as a drop-in alternative to existing tools (e.g., Do et al. 2014).The latest development of temporal <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> has been shifting towards its application on genomic datasets, in which correlation among densely linked loci reduces the effective amount of information they hold (Hui et al. 2021; Waples et al. 2022). Even after accounting for pseudo-replication sample size is likely to be the limiting factor for problems like this. The precision of <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> depends on <math>\n <semantics>\n <mrow>\n <mi>s</mi>\n <mo>/</mo>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math>, meaning sample size cannot be perfectly determined a priori. Putting the issue of sample size aside, the ways in which MAXTEMP enhances <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> estimation also depend on the true size. For small, fragmented populations a certain local <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> has to be maintained to prevent inbreeding depression or mutational meltdown. Having tighter confidence intervals (C.I.) will undoubtably benefit the assessment of genetic health to ensure <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> is always kept above a certain threshold. It also facilitates early detection of population decline as swift action is required to prevent any irreversible loss of genetic variation, or worse, extinction. On the other end of the spectrum where larger <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> is of concern, the point estimate (or lower bound) of <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> can be negative which is interpreted as no drift with infinitely large <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math>. Reducing the variance of <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n </mrow>\n </semantics></math> will lower the occurrence of yielding negative estimates as a direct consequence, or in other words, increase the chance of having meaningful <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> and C.I. for decision making. It is foreseeable that with the aid of computer simulation the spirit of MAXTEMP will inspire future designs of population monitoring programmes, such as to incorporate samples from historic or pilot studies into the main analyses, or to develop contingency plans with post hoc sampling to rescue a negative <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> estimate. With <math>\n <semantics>\n <mrow>\n <msub>\n <mi>N</mi>\n <mi>e</mi>\n </msub>\n </mrow>\n </semantics></math> being recently incorporated as one of the headline indicators by the Convention on Biological Diversity (Thurfjell et al. 2022), the type of analyses requiring MAXTEMP is expected to grow. I am confident that the community will bring MAXTEMP to its full potential in years to come.The author declares no conflicts of interest.","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":"25 7","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1755-0998.14125","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.14125","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Effective population size ( $N_{e}$ ) is indisputably one of the most important parameters in evolutionary biology. It governs the rate of evolution, magnitude of drift, effectiveness of selection, diversity, and many more. It also serves as a key indicator to inform population monitoring programmes, from conservation of endangered species to biocontrol of agricultural pests or disease vectors, and almost everything in between. In applications in which the contemporary $N_{e}$ is of interest, temporal $F$ is one of the most widely used estimators. It measures the magnitude of drift among genetically neutral loci to estimate the harmonic mean $N_{e}$ between two time points. In this issue, Waples et al. (2025) present us a new software “MAXTEMP” to improve the precision of temporal $F$ by incorporating additional samples outside of the focal period.

I was sceptical at first as these seemingly unrelated samples appear to be uninformative, especially after I had revisited Waples (2005) on the time periods at which the temporal $N_{e}$ estimates apply. Upon closer inspection, the authors reassured us with an intuitive yet robust argument: consider a population monitoring programme with initially two temporal samples sandwiching the focal period. A direct $N_{e}$ estimate is obtained via temporal $F$ in the traditional way (Waples 1989). A third sample is subsequently collected, and the same $N_{e}$ can also be implied from the difference of the overall $F$ and that between the second and third time points. There exists a weighted average (linear combination) of the two that outperforms either estimate by having a lower variance. The idea can be generalised to an arbitrary number of additional samples from which the implied estimates are calculated. The remaining challenge is to find an appropriate weighing system which the authors duly examined.

From a technical perspective, the direct and implied estimates share the same expectation hence the combined one remains largely unbiased. Without any additional cost (apart from sampling more individuals) MAXTEMP brings the benefit of variance reduction, whose magnitude depends on individual variances as well as their pairwise correlation. The implied estimates often come with much larger variances given they span across longer horizons. As all estimates aim to extract the same drift signal for the focal period positive correlations are unavoidably induced, limiting the potential to shrink the variance further. Despite these mathematical constraints the authors demonstrated a reduction in the standard deviation of $F$ by up to 50% in selected cases. In my experience, the correlation weakens when the sample size $s$ is limited, as sampling noise colludes with the underlying drift. This is hugely encouraging that the improvement is the greatest when it is most needed. While it is tempting to include as many additional samples as possible, the authors recommended to bring no more than one from either side or information will become saturated. It is also possible to couple temporal $F$ with single-sample estimates (e.g., Linkage Disequilibrium among unlinked loci) to resolve mystery $N_{e}$ 's of unsampled generations.

In parallel with temporal $F$ , the maximum likelihood (ML) approaches are being utilised to extract the same drift signal by considering the distributional change in allele frequency over time (Williamson and Slatkin 1999; Hui and Burt 2015). Both Waples et al. and I discovered that ML also benefits from having additional samples almost in the same way as the moment-based $F$ , with weights being automatically chosen during maximisation. This welcoming effect was not examined or documented in previous publications, although some suggested the possibility to aggregate multigenerational samples for non-constant $N_{e}$ scenarios. Further investigation into the ML methods is required. Conceptually, MAXTEMP calculates $F$ 's from pairs of samples iteratively while ML jointly considers all data points, hence the idea of refining an earlier estimate does not exist in the latter. Given the lack of provision of software and the non-trivial effort to alter ML algorithms to cater to individual sampling plans, the timely arrival of MAXTEMP fills in the current resource gap as a drop-in alternative to existing tools (e.g., Do et al. 2014).

The latest development of temporal $F$ has been shifting towards its application on genomic datasets, in which correlation among densely linked loci reduces the effective amount of information they hold (Hui et al. 2021; Waples et al. 2022). Even after accounting for pseudo-replication sample size is likely to be the limiting factor for problems like this. The precision of $F$ depends on $s / N_{e}$ , meaning sample size cannot be perfectly determined a priori. Putting the issue of sample size aside, the ways in which MAXTEMP enhances $N_{e}$ estimation also depend on the true size. For small, fragmented populations a certain local $N_{e}$ has to be maintained to prevent inbreeding depression or mutational meltdown. Having tighter confidence intervals (C.I.) will undoubtably benefit the assessment of genetic health to ensure $N_{e}$ is always kept above a certain threshold. It also facilitates early detection of population decline as swift action is required to prevent any irreversible loss of genetic variation, or worse, extinction. On the other end of the spectrum where larger $N_{e}$ is of concern, the point estimate (or lower bound) of $F$ can be negative which is interpreted as no drift with infinitely large $N_{e}$ . Reducing the variance of $F$ will lower the occurrence of yielding negative estimates as a direct consequence, or in other words, increase the chance of having meaningful $N_{e}$ and C.I. for decision making. It is foreseeable that with the aid of computer simulation the spirit of MAXTEMP will inspire future designs of population monitoring programmes, such as to incorporate samples from historic or pilot studies into the main analyses, or to develop contingency plans with post hoc sampling to rescue a negative $N_{e}$ estimate. With $N_{e}$ being recently incorporated as one of the headline indicators by the Convention on Biological Diversity (Thurfjell et al. 2022), the type of analyses requiring MAXTEMP is expected to grow. I am confident that the community will bring MAXTEMP to its full potential in years to come.

The author declares no conflicts of interest.

查看原文本刊更多论文

在长期人口监测方案中最大限度地发挥时间N e $$ {\boldsymbol{N}}_{\mathbf{e}} $$估计的潜力。

有效种群大小（N e）无疑是进化生物学中最重要的参数之一。它控制着进化的速度、漂移的幅度、选择的有效性、多样性等等。它还可以作为一个关键指标，为人口监测方案提供信息，从濒危物种保护到农业害虫或病媒的生物防治，以及介于两者之间的几乎所有方案。在对当代N e感兴趣的应用中，时间F是最广泛使用的估计量之一。它测量遗传中性位点之间漂移的大小，以估计两个时间点之间的调和平均N e。在本期中，Waples等人（2025）向我们展示了一种新的软件“MAXTEMP”，通过合并焦点周期以外的额外样本来提高时间F的精度。起初我持怀疑态度，因为这些看似无关的样本似乎没有提供信息，尤其是在我重新审视了Waples（2005）关于时间ne估计适用的时间段之后。经过更仔细的检查，作者用一个直观而有力的论点使我们放心：考虑一个人口监测计划，最初两个临时样本夹在焦点期。用传统的方法通过时间F得到直接的ne估计（Waples 1989）。随后收集第三个样本，从总体F与第二和第三个时间点之间的差异中也可以隐含相同的N e。存在两个的加权平均值（线性组合），通过具有更低的方差来优于任何一个估计。这个想法可以推广到任意数量的附加样本，从中计算隐含估计。剩下的挑战是找到一个适当的称重系统，作者适当地检查。从技术角度来看，直接估计和隐含估计具有相同的期望，因此合并后的估计基本上是无偏的。在没有任何额外成本的情况下（除了采样更多的个体），MAXTEMP带来了方差减少的好处，其大小取决于个体方差及其成对相关性。隐含的估计往往有更大的差异，因为它们跨越更长的视界。由于所有估计都旨在提取焦点周期的相同漂移信号，因此不可避免地会引起正相关性，从而限制了进一步缩小方差的潜力。尽管有这些数学上的限制，作者证明在选定的情况下，F的标准偏差减少了50%。根据我的经验，当样本量有限时，相关性会减弱，因为采样噪声与潜在的漂移相勾结。这是非常令人鼓舞的，改进是在最需要的时候最大的。虽然包含尽可能多的额外样本是诱人的，但作者建议从任何一方携带的样本都不要超过一个，否则信息就会饱和。也可以将时间F与单样本估计（例如，非连锁位点之间的连锁不平衡）耦合起来，以解决未采样世代的神秘N e。与时间F并行，最大似然（ML）方法被用于通过考虑等位基因频率随时间的分布变化来提取相同的漂移信号（Williamson and Slatkin 1999; Hui and Burt 2015）。Waples等人和我都发现，机器学习也受益于与基于矩的F几乎相同的额外样本，在最大化过程中自动选择权重。这种受欢迎的效应在以前的出版物中没有被检查或记录，尽管一些出版物建议在非恒定的N - e情景中汇总多代样本的可能性。需要进一步研究ML方法。从概念上讲，MAXTEMP迭代地从成对的样本中计算F，而ML联合考虑所有数据点，因此改进早期估计的想法在后者中不存在。考虑到缺乏软件的供应，以及改变ML算法以适应个人采样计划的重要努力，MAXTEMP的及时到来填补了当前的资源缺口，作为现有工具的替代方案（例如，Do等人，2014）。时间F的最新发展已转向其在基因组数据集上的应用，其中密集链接位点之间的相关性减少了它们所持有的有效信息量（Hui et al. 2021; Waples et al. 2022）。即使在考虑了伪复制后，样本量也可能是这类问题的限制因素。F的精度取决于s / N / e，这意味着样本量不能完全先验地确定。撇开样本大小的问题不谈，MAXTEMP增强N - e估计的方式也取决于真实大小。对于小而分散的种群，必须维持一定的局部N e，以防止近亲繁殖衰退或突变崩溃。拥有更严格的置信区间（ci）无疑将有利于基因健康评估，以确保N - e始终保持在某个阈值以上。它还有助于早期发现种群下降，因为需要迅速采取行动，防止任何不可逆转的遗传变异损失，或更糟的是，灭绝。另一方面，更大的ne是值得关注的，F的点估计（或下界）可以是负的，这被解释为在无限大的ne下没有漂移。减少F的方差将降低产生负估计的直接结果的发生，或者换句话说，增加有意义的N e和C.I.决策的机会。可以预见，在计算机模拟的帮助下，MAXTEMP的精神将启发未来人口监测方案的设计，例如将历史或试点研究中的样本纳入主要分析，或制定应急计划，通过临时抽样来挽救负的N e估计。随着N最近被《生物多样性公约》（Thurfjell et al. 2022）纳入主要指标之一，预计需要MAXTEMP的分析类型将会增加。我相信，在未来的几年里，社区将使MAXTEMP充分发挥其潜力。作者声明无利益冲突。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular Ecology Resources 生物-进化生物学

CiteScore

15.60

自引率

5.20%

发文量

170

审稿时长

3 months

期刊介绍： Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines. In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.