{"title":"Calibrating epigenetic clocks with training data error","authors":"Benjamin Mayne, Oliver Berry, Simon Jarman","doi":"10.1111/eva.13582","DOIUrl":null,"url":null,"abstract":"<p>Animal age data are valuable for management of wildlife populations. Yet, for most species, there is no practical method for determining the age of unknown individuals. However, epigenetic clocks, a molecular-based method, are capable of age prediction by sampling specific tissue types and measuring DNA methylation levels at specific loci. Developing an epigenetic clock requires a large number of samples from animals of known ages. For most species, there are no individuals whose exact ages are known, making epigenetic clock calibration inaccurate or impossible. For many epigenetic clocks, calibration samples with inaccurate age estimates introduce a degree of error to epigenetic clock calibration. In this study, we investigated how much error in the training data set of an epigenetic clock can be tolerated before it resulted in an unacceptable increase in error for age prediction. Using four publicly available data sets, we artificially increased the training data age error by iterations of 1% and then tested the model against an independent set of known ages. A small effect size increase (Cohen's d >0.2) was detected when the error in age was higher than 22%. The effect size increased linearly with age error. This threshold was independent of sample size. Downstream applications for age data may have a more important role in deciding how much error can be tolerated for age prediction. If highly precise age estimates are required, then it may be futile to embark on the development of an epigenetic clock when there is no accurately aged calibration population to work with. However, for other problems, such as determining the relative age order of pairs of individuals, a lower-quality calibration data set may be adequate.</p>","PeriodicalId":168,"journal":{"name":"Evolutionary Applications","volume":"16 8","pages":"1496-1502"},"PeriodicalIF":3.5000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/eva.13582","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Applications","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/eva.13582","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Animal age data are valuable for management of wildlife populations. Yet, for most species, there is no practical method for determining the age of unknown individuals. However, epigenetic clocks, a molecular-based method, are capable of age prediction by sampling specific tissue types and measuring DNA methylation levels at specific loci. Developing an epigenetic clock requires a large number of samples from animals of known ages. For most species, there are no individuals whose exact ages are known, making epigenetic clock calibration inaccurate or impossible. For many epigenetic clocks, calibration samples with inaccurate age estimates introduce a degree of error to epigenetic clock calibration. In this study, we investigated how much error in the training data set of an epigenetic clock can be tolerated before it resulted in an unacceptable increase in error for age prediction. Using four publicly available data sets, we artificially increased the training data age error by iterations of 1% and then tested the model against an independent set of known ages. A small effect size increase (Cohen's d >0.2) was detected when the error in age was higher than 22%. The effect size increased linearly with age error. This threshold was independent of sample size. Downstream applications for age data may have a more important role in deciding how much error can be tolerated for age prediction. If highly precise age estimates are required, then it may be futile to embark on the development of an epigenetic clock when there is no accurately aged calibration population to work with. However, for other problems, such as determining the relative age order of pairs of individuals, a lower-quality calibration data set may be adequate.
动物年龄数据对野生动物种群管理很有价值。然而,对于大多数物种来说,没有实用的方法来确定未知个体的年龄。然而,表观遗传时钟,一种基于分子的方法,能够通过采样特定组织类型和测量特定位点的DNA甲基化水平来预测年龄。开发表观遗传时钟需要从已知年龄的动物身上获得大量样本。对于大多数物种来说,没有个体的确切年龄是已知的,这使得表观遗传时钟的校准不准确或不可能。对于许多表观遗传时钟,年龄估计不准确的校准样本会给表观遗传时钟校准带来一定程度的误差。在这项研究中,我们研究了表观遗传时钟训练数据集的误差在导致年龄预测误差不可接受的增加之前可以容忍多少。使用四个公开可用的数据集,我们通过1%的迭代人为地增加训练数据年龄误差,然后针对一组独立的已知年龄测试模型。当年龄误差大于22%时,检测到一个小的效应量增加(Cohen's d >0.2)。效应量随年龄误差线性增加。该阈值与样本量无关。年龄数据的下游应用程序可能在决定年龄预测可以容忍多少误差方面发挥更重要的作用。如果需要高度精确的年龄估计,那么在没有精确年龄的校准人群可以使用的情况下,着手开发表观遗传时钟可能是徒劳的。然而,对于其他问题,例如确定个体对的相对年龄顺序,低质量的校准数据集可能就足够了。
期刊介绍:
Evolutionary Applications is a fully peer reviewed open access journal. It publishes papers that utilize concepts from evolutionary biology to address biological questions of health, social and economic relevance. Papers are expected to employ evolutionary concepts or methods to make contributions to areas such as (but not limited to): medicine, agriculture, forestry, exploitation and management (fisheries and wildlife), aquaculture, conservation biology, environmental sciences (including climate change and invasion biology), microbiology, and toxicology. All taxonomic groups are covered from microbes, fungi, plants and animals. In order to better serve the community, we also now strongly encourage submissions of papers making use of modern molecular and genetic methods (population and functional genomics, transcriptomics, proteomics, epigenetics, quantitative genetics, association and linkage mapping) to address important questions in any of these disciplines and in an applied evolutionary framework. Theoretical, empirical, synthesis or perspective papers are welcome.