在DNA甲基化研究中优化遗传祖先调整:方法的比较分析。

Kira D Höffler, Seyma Katrinli, Matthew W Halvorsen, Anne-Kristin Stavrum, Kevin S O'Connell, Alexey Shadrin, Srdjan Djurovic, Ole A Andreassen, James J Crowley, Jan Haavik, Kristen Hagen, Gerd Kvale, Kerry Ressler, Bjarne Hansen, Jair C Soares, Gabriel R Fries, Alicia K Smith, Stéphanie Le Hellard
{"title":"在DNA甲基化研究中优化遗传祖先调整:方法的比较分析。","authors":"Kira D Höffler, Seyma Katrinli, Matthew W Halvorsen, Anne-Kristin Stavrum, Kevin S O'Connell, Alexey Shadrin, Srdjan Djurovic, Ole A Andreassen, James J Crowley, Jan Haavik, Kristen Hagen, Gerd Kvale, Kerry Ressler, Bjarne Hansen, Jair C Soares, Gabriel R Fries, Alicia K Smith, Stéphanie Le Hellard","doi":"10.21203/rs.3.rs-6580295/v1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Genetic ancestry is an important factor to account for in DNA methylation studies because genetic variation influences DNA methylation patterns. One approach uses principal components (PCs) calculated from CpG sites that overlap with common SNPs to adjust for ancestry when genotyping data is not available. However, this method does not remove technical and biological variations, such as sex and age, prior to calculating the PCs. The first PC is therefore often associated with factors other than ancestry.</p><p><strong>Methods: </strong>We developed and adapted the adapted <i>EpiAnceR +</i> approach, which includes 1) residualizing the CpG data overlapping with common SNPs for control probe PCs, sex, age, and cell type proportions to remove the effects of technical and biological factors, and 2) integrating the residualized data with genotype calls from the SNP probes (commonly referred to as rs probes) present on the arrays, before calculating PCs and evaluated the clustering ability and relationship to genetic ancestry.</p><p><strong>Results: </strong>The PCs generated by <i>EpiAnceR +</i> led to improved clustering for repeated samples from the same individual and stronger associations with genetic ancestry groups predicted from genotype information compared to the original approach.</p><p><strong>Conclusions: </strong>We show that the <i>EpiAnceR +</i> approach improves the adjustment for genetic ancestry in DNA methylation studies. <i>EpiAnceR +</i> can be integrated into existing R pipelines for commercial methylation arrays, such as 450K, EPICv1, and EPICv2. The code is available on GitHub (https://github.com/KiraHoeffler/EpiAnceR).</p>","PeriodicalId":519972,"journal":{"name":"Research square","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12136211/pdf/","citationCount":"0","resultStr":"{\"title\":\"Optimizing Genetic Ancestry Adjustment in DNA Methylation Studies: A Comparative Analysis of Approaches.\",\"authors\":\"Kira D Höffler, Seyma Katrinli, Matthew W Halvorsen, Anne-Kristin Stavrum, Kevin S O'Connell, Alexey Shadrin, Srdjan Djurovic, Ole A Andreassen, James J Crowley, Jan Haavik, Kristen Hagen, Gerd Kvale, Kerry Ressler, Bjarne Hansen, Jair C Soares, Gabriel R Fries, Alicia K Smith, Stéphanie Le Hellard\",\"doi\":\"10.21203/rs.3.rs-6580295/v1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Genetic ancestry is an important factor to account for in DNA methylation studies because genetic variation influences DNA methylation patterns. One approach uses principal components (PCs) calculated from CpG sites that overlap with common SNPs to adjust for ancestry when genotyping data is not available. However, this method does not remove technical and biological variations, such as sex and age, prior to calculating the PCs. The first PC is therefore often associated with factors other than ancestry.</p><p><strong>Methods: </strong>We developed and adapted the adapted <i>EpiAnceR +</i> approach, which includes 1) residualizing the CpG data overlapping with common SNPs for control probe PCs, sex, age, and cell type proportions to remove the effects of technical and biological factors, and 2) integrating the residualized data with genotype calls from the SNP probes (commonly referred to as rs probes) present on the arrays, before calculating PCs and evaluated the clustering ability and relationship to genetic ancestry.</p><p><strong>Results: </strong>The PCs generated by <i>EpiAnceR +</i> led to improved clustering for repeated samples from the same individual and stronger associations with genetic ancestry groups predicted from genotype information compared to the original approach.</p><p><strong>Conclusions: </strong>We show that the <i>EpiAnceR +</i> approach improves the adjustment for genetic ancestry in DNA methylation studies. <i>EpiAnceR +</i> can be integrated into existing R pipelines for commercial methylation arrays, such as 450K, EPICv1, and EPICv2. The code is available on GitHub (https://github.com/KiraHoeffler/EpiAnceR).</p>\",\"PeriodicalId\":519972,\"journal\":{\"name\":\"Research square\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12136211/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research square\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21203/rs.3.rs-6580295/v1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research square","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21203/rs.3.rs-6580295/v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

遗传祖先是DNA甲基化研究中的一个重要因素,因为遗传变异会影响DNA甲基化模式。一种方法是在没有基因分型数据的情况下,利用从与常见snp重叠的CpG位点计算出的主成分(PCs)来调整祖先。然而,在计算pc之前,这种方法并没有消除技术和生物学上的变化,比如性别和年龄。因此,第一个PC通常与祖先以外的因素有关。方法我们开发并调整了适应性EpiAnceR +方法,其中包括:1)残化与控制探针pc、性别、年龄和细胞类型比例的常见SNP重叠的CpG数据,以消除技术和生物因素的影响;2)将残化数据与阵列上存在的SNP探针(通常称为rs探针)的基因型调用整合,然后计算pc并评估聚类能力及其与遗传祖先的关系。结果与原始方法相比,EpiAnceR +生成的pc对来自同一个体的重复样本进行了更好的聚类,并且与基因型信息预测的遗传祖先群体有更强的关联。我们表明,EpiAnceR +方法改善了DNA甲基化研究中遗传祖先的调整。EpiAnceR +可以集成到现有的R管道中,用于商业甲基化阵列,如450K, EPICv1和EPICv2。代码可在GitHub (https://github.com/KiraHoeffler/EpiAnceR)上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimizing Genetic Ancestry Adjustment in DNA Methylation Studies: A Comparative Analysis of Approaches.

Background: Genetic ancestry is an important factor to account for in DNA methylation studies because genetic variation influences DNA methylation patterns. One approach uses principal components (PCs) calculated from CpG sites that overlap with common SNPs to adjust for ancestry when genotyping data is not available. However, this method does not remove technical and biological variations, such as sex and age, prior to calculating the PCs. The first PC is therefore often associated with factors other than ancestry.

Methods: We developed and adapted the adapted EpiAnceR + approach, which includes 1) residualizing the CpG data overlapping with common SNPs for control probe PCs, sex, age, and cell type proportions to remove the effects of technical and biological factors, and 2) integrating the residualized data with genotype calls from the SNP probes (commonly referred to as rs probes) present on the arrays, before calculating PCs and evaluated the clustering ability and relationship to genetic ancestry.

Results: The PCs generated by EpiAnceR + led to improved clustering for repeated samples from the same individual and stronger associations with genetic ancestry groups predicted from genotype information compared to the original approach.

Conclusions: We show that the EpiAnceR + approach improves the adjustment for genetic ancestry in DNA methylation studies. EpiAnceR + can be integrated into existing R pipelines for commercial methylation arrays, such as 450K, EPICv1, and EPICv2. The code is available on GitHub (https://github.com/KiraHoeffler/EpiAnceR).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信