稳定性选择增强了特征选择,只需使用五个 DNA 甲基化位点就能准确预测胎龄。

IF 5.7 2区 医学 Q1 Medicine
Kristine L Haftorn, Julia Romanowska, Yunsung Lee, Christian M Page, Per M Magnus, Siri E Håberg, Jon Bohlin, Astanand Jugessur, William R P Denault
{"title":"稳定性选择增强了特征选择,只需使用五个 DNA 甲基化位点就能准确预测胎龄。","authors":"Kristine L Haftorn, Julia Romanowska, Yunsung Lee, Christian M Page, Per M Magnus, Siri E Håberg, Jon Bohlin, Astanand Jugessur, William R P Denault","doi":"10.1186/s13148-023-01528-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>DNA methylation (DNAm) is robustly associated with chronological age in children and adults, and gestational age (GA) in newborns. This property has enabled the development of several epigenetic clocks that can accurately predict chronological age and GA. However, the lack of overlap in predictive CpGs across different epigenetic clocks remains elusive. Our main aim was therefore to identify and characterize CpGs that are stably predictive of GA.</p><p><strong>Results: </strong>We applied a statistical approach called 'stability selection' to DNAm data from 2138 newborns in the Norwegian Mother, Father, and Child Cohort study. Stability selection combines subsampling with variable selection to restrict the number of false discoveries in the set of selected variables. Twenty-four CpGs were identified as being stably predictive of GA. Intriguingly, only up to 10% of the CpGs in previous GA clocks were found to be stably selected. Based on these results, we used generalized additive model regression to develop a new GA clock consisting of only five CpGs, which showed a similar predictive performance as previous GA clocks (R<sup>2</sup> = 0.674, median absolute deviation = 4.4 days). These CpGs were in or near genes and regulatory regions involved in immune responses, metabolism, and developmental processes. Furthermore, accounting for nonlinear associations improved prediction performance in preterm newborns.</p><p><strong>Conclusion: </strong>We present a methodological framework for feature selection that is broadly applicable to any trait that can be predicted from DNAm data. We demonstrate its utility by identifying CpGs that are highly predictive of GA and present a new and highly performant GA clock based on only five CpGs that is more amenable to a clinical setting.</p>","PeriodicalId":48652,"journal":{"name":"Clinical Epigenetics","volume":"15 1","pages":"114"},"PeriodicalIF":5.7000,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10339624/pdf/","citationCount":"0","resultStr":"{\"title\":\"Stability selection enhances feature selection and enables accurate prediction of gestational age using only five DNA methylation sites.\",\"authors\":\"Kristine L Haftorn, Julia Romanowska, Yunsung Lee, Christian M Page, Per M Magnus, Siri E Håberg, Jon Bohlin, Astanand Jugessur, William R P Denault\",\"doi\":\"10.1186/s13148-023-01528-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>DNA methylation (DNAm) is robustly associated with chronological age in children and adults, and gestational age (GA) in newborns. This property has enabled the development of several epigenetic clocks that can accurately predict chronological age and GA. However, the lack of overlap in predictive CpGs across different epigenetic clocks remains elusive. Our main aim was therefore to identify and characterize CpGs that are stably predictive of GA.</p><p><strong>Results: </strong>We applied a statistical approach called 'stability selection' to DNAm data from 2138 newborns in the Norwegian Mother, Father, and Child Cohort study. Stability selection combines subsampling with variable selection to restrict the number of false discoveries in the set of selected variables. Twenty-four CpGs were identified as being stably predictive of GA. Intriguingly, only up to 10% of the CpGs in previous GA clocks were found to be stably selected. Based on these results, we used generalized additive model regression to develop a new GA clock consisting of only five CpGs, which showed a similar predictive performance as previous GA clocks (R<sup>2</sup> = 0.674, median absolute deviation = 4.4 days). These CpGs were in or near genes and regulatory regions involved in immune responses, metabolism, and developmental processes. Furthermore, accounting for nonlinear associations improved prediction performance in preterm newborns.</p><p><strong>Conclusion: </strong>We present a methodological framework for feature selection that is broadly applicable to any trait that can be predicted from DNAm data. We demonstrate its utility by identifying CpGs that are highly predictive of GA and present a new and highly performant GA clock based on only five CpGs that is more amenable to a clinical setting.</p>\",\"PeriodicalId\":48652,\"journal\":{\"name\":\"Clinical Epigenetics\",\"volume\":\"15 1\",\"pages\":\"114\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2023-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10339624/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Epigenetics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13148-023-01528-3\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Epigenetics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13148-023-01528-3","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

背景:DNA 甲基化(DNAm)与儿童和成人的实际年龄以及新生儿的胎龄(GA)密切相关。由于这一特性,人们开发出了几种能准确预测儿童和成人生理年龄和胎龄的表观遗传时钟。然而,不同表观遗传时钟的预测性 CpGs 缺乏重叠,这一点仍然难以捉摸。因此,我们的主要目的是鉴定和描述能稳定预测 GA 的 CpGs:我们对挪威母亲、父亲和儿童队列研究(Norwegian Mother, Father, and Child Cohort study)中2138名新生儿的DNAm数据采用了一种名为 "稳定性选择"(stability selection)的统计方法。稳定性选择将子取样与变量选择相结合,以限制所选变量集中的错误发现数量。结果发现有 24 个 CpGs 可稳定地预测 GA。耐人寻味的是,在以前的 GA 时钟中,只有高达 10% 的 CpGs 被发现是稳定选择的。基于这些结果,我们使用广义加性模型回归法开发了一个仅由五个 CpGs 组成的新的 GA 时钟,其预测性能与以前的 GA 时钟相似(R2 = 0.674,中位绝对偏差 = 4.4 天)。这些 CpGs 位于或靠近涉及免疫反应、新陈代谢和发育过程的基因和调控区域。此外,非线性关联的考虑提高了早产新生儿的预测性能:我们提出了一种特征选择方法框架,它广泛适用于任何可通过 DNAm 数据预测的性状。我们通过识别对 GA 有高度预测作用的 CpGs 证明了这一方法的实用性,并提出了一种仅基于五个 CpGs 的新型高性能 GA 时钟,该时钟更适合临床环境。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Stability selection enhances feature selection and enables accurate prediction of gestational age using only five DNA methylation sites.

Background: DNA methylation (DNAm) is robustly associated with chronological age in children and adults, and gestational age (GA) in newborns. This property has enabled the development of several epigenetic clocks that can accurately predict chronological age and GA. However, the lack of overlap in predictive CpGs across different epigenetic clocks remains elusive. Our main aim was therefore to identify and characterize CpGs that are stably predictive of GA.

Results: We applied a statistical approach called 'stability selection' to DNAm data from 2138 newborns in the Norwegian Mother, Father, and Child Cohort study. Stability selection combines subsampling with variable selection to restrict the number of false discoveries in the set of selected variables. Twenty-four CpGs were identified as being stably predictive of GA. Intriguingly, only up to 10% of the CpGs in previous GA clocks were found to be stably selected. Based on these results, we used generalized additive model regression to develop a new GA clock consisting of only five CpGs, which showed a similar predictive performance as previous GA clocks (R2 = 0.674, median absolute deviation = 4.4 days). These CpGs were in or near genes and regulatory regions involved in immune responses, metabolism, and developmental processes. Furthermore, accounting for nonlinear associations improved prediction performance in preterm newborns.

Conclusion: We present a methodological framework for feature selection that is broadly applicable to any trait that can be predicted from DNAm data. We demonstrate its utility by identifying CpGs that are highly predictive of GA and present a new and highly performant GA clock based on only five CpGs that is more amenable to a clinical setting.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Clinical Epigenetics
Clinical Epigenetics Biochemistry, Genetics and Molecular Biology-Developmental Biology
CiteScore
8.90
自引率
5.30%
发文量
150
审稿时长
12 weeks
期刊介绍: Clinical Epigenetics, the official journal of the Clinical Epigenetics Society, is an open access, peer-reviewed journal that encompasses all aspects of epigenetic principles and mechanisms in relation to human disease, diagnosis and therapy. Clinical trials and research in disease model organisms are particularly welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信