Yu Qian, Qianqian Peng, Qili Qian, Xingjian Gao, Xinxuan Liu, Yi Li, Xiu Fan, Yuan Cheng, Na Yuan, Sibte Hadi, Li Jin, Sijia Wang, Fan Liu
{"title":"A methylation panel of 10 CpGs for accurate age inference via stepwise conditional epigenome-wide association study.","authors":"Yu Qian, Qianqian Peng, Qili Qian, Xingjian Gao, Xinxuan Liu, Yi Li, Xiu Fan, Yuan Cheng, Na Yuan, Sibte Hadi, Li Jin, Sijia Wang, Fan Liu","doi":"10.1007/s00414-024-03365-2","DOIUrl":null,"url":null,"abstract":"<p><p>Estimating individual age from DNA methylation at age associated CpG sites may provide key information facilitating forensic investigations. Systematic marker screening and feature selection play a critical role in ensuring the performance of the final prediction model. In the discovery stage, we screened for 811876 CpGs from whole blood of 2664 Chinese individuals ranging from 18 to 83 years of age based on a stepwise conditional epigenome-wide association study (SCEWAS). The SCEWAS identified 28 CpGs showing genome-wide significant and independent effects. Further restricting this panel to 10 most informative CpGs showed a tolerable loss of information. A linear model consisting of these 10 CpGs could explain 93% of the age variance (R<sup>2</sup> = 0.93) in the training set (n = 2664). In an independent test set of Chinese individuals (n = 648), this model also provided highly accurate predictions (R<sup>2</sup> = 0.85, mean absolute deviation, MAD = 3.20 years). The model was additionally validated in a public dataset of multiple ancestral origins (86 Europeans, 14 Asians, and 273 Africans) and the prediction accuracy reduced significantly (R<sup>2</sup> = 0.85, MAD = 6.21 years), as might be expected due to different genomic backgrounds, sample sizes, and age ranges. Our 10 CpG model also outperformed the recently proposed 9-CpG model constructed in 390 Chinese males (R<sup>2</sup> = 0.79 in test set). We also demonstrated that our SCEWAS approach outperformed the traditional EWAS and the elastic net approach in obtaining a small set of most age informative CpGs. Overall, our systematic genome-wide feature selection identified a small panel of 10 CpGs for accurate age estimation with high potential in forensic applications.</p>","PeriodicalId":14071,"journal":{"name":"International Journal of Legal Medicine","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Legal Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00414-024-03365-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
引用次数: 0
Abstract
Estimating individual age from DNA methylation at age associated CpG sites may provide key information facilitating forensic investigations. Systematic marker screening and feature selection play a critical role in ensuring the performance of the final prediction model. In the discovery stage, we screened for 811876 CpGs from whole blood of 2664 Chinese individuals ranging from 18 to 83 years of age based on a stepwise conditional epigenome-wide association study (SCEWAS). The SCEWAS identified 28 CpGs showing genome-wide significant and independent effects. Further restricting this panel to 10 most informative CpGs showed a tolerable loss of information. A linear model consisting of these 10 CpGs could explain 93% of the age variance (R2 = 0.93) in the training set (n = 2664). In an independent test set of Chinese individuals (n = 648), this model also provided highly accurate predictions (R2 = 0.85, mean absolute deviation, MAD = 3.20 years). The model was additionally validated in a public dataset of multiple ancestral origins (86 Europeans, 14 Asians, and 273 Africans) and the prediction accuracy reduced significantly (R2 = 0.85, MAD = 6.21 years), as might be expected due to different genomic backgrounds, sample sizes, and age ranges. Our 10 CpG model also outperformed the recently proposed 9-CpG model constructed in 390 Chinese males (R2 = 0.79 in test set). We also demonstrated that our SCEWAS approach outperformed the traditional EWAS and the elastic net approach in obtaining a small set of most age informative CpGs. Overall, our systematic genome-wide feature selection identified a small panel of 10 CpGs for accurate age estimation with high potential in forensic applications.
期刊介绍:
The International Journal of Legal Medicine aims to improve the scientific resources used in the elucidation of crime and related forensic applications at a high level of evidential proof. The journal offers review articles tracing development in specific areas, with up-to-date analysis; original articles discussing significant recent research results; case reports describing interesting and exceptional examples; population data; letters to the editors; and technical notes, which appear in a section originally created for rapid publication of data in the dynamic field of DNA analysis.