Kira D Höffler, Seyma Katrinli, Matthew W Halvorsen, Anne-Kristin Stavrum, Kevin S O'Connell, Alexey Shadrin, Srdjan Djurovic, Ole A Andreassen, James J Crowley, Jan Haavik, Kristen Hagen, Gerd Kvale, Kerry Ressler, Bjarne Hansen, Jair C Soares, Gabriel R Fries, Alicia K Smith, Stéphanie Le Hellard
{"title":"在DNA甲基化研究中优化遗传祖先调整:方法的比较分析。","authors":"Kira D Höffler, Seyma Katrinli, Matthew W Halvorsen, Anne-Kristin Stavrum, Kevin S O'Connell, Alexey Shadrin, Srdjan Djurovic, Ole A Andreassen, James J Crowley, Jan Haavik, Kristen Hagen, Gerd Kvale, Kerry Ressler, Bjarne Hansen, Jair C Soares, Gabriel R Fries, Alicia K Smith, Stéphanie Le Hellard","doi":"10.21203/rs.3.rs-6580295/v1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Genetic ancestry is an important factor to account for in DNA methylation studies because genetic variation influences DNA methylation patterns. One approach uses principal components (PCs) calculated from CpG sites that overlap with common SNPs to adjust for ancestry when genotyping data is not available. However, this method does not remove technical and biological variations, such as sex and age, prior to calculating the PCs. The first PC is therefore often associated with factors other than ancestry.</p><p><strong>Methods: </strong>We developed and adapted the adapted <i>EpiAnceR +</i> approach, which includes 1) residualizing the CpG data overlapping with common SNPs for control probe PCs, sex, age, and cell type proportions to remove the effects of technical and biological factors, and 2) integrating the residualized data with genotype calls from the SNP probes (commonly referred to as rs probes) present on the arrays, before calculating PCs and evaluated the clustering ability and relationship to genetic ancestry.</p><p><strong>Results: </strong>The PCs generated by <i>EpiAnceR +</i> led to improved clustering for repeated samples from the same individual and stronger associations with genetic ancestry groups predicted from genotype information compared to the original approach.</p><p><strong>Conclusions: </strong>We show that the <i>EpiAnceR +</i> approach improves the adjustment for genetic ancestry in DNA methylation studies. <i>EpiAnceR +</i> can be integrated into existing R pipelines for commercial methylation arrays, such as 450K, EPICv1, and EPICv2. The code is available on GitHub (https://github.com/KiraHoeffler/EpiAnceR).</p>","PeriodicalId":519972,"journal":{"name":"Research square","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12136211/pdf/","citationCount":"0","resultStr":"{\"title\":\"Optimizing Genetic Ancestry Adjustment in DNA Methylation Studies: A Comparative Analysis of Approaches.\",\"authors\":\"Kira D Höffler, Seyma Katrinli, Matthew W Halvorsen, Anne-Kristin Stavrum, Kevin S O'Connell, Alexey Shadrin, Srdjan Djurovic, Ole A Andreassen, James J Crowley, Jan Haavik, Kristen Hagen, Gerd Kvale, Kerry Ressler, Bjarne Hansen, Jair C Soares, Gabriel R Fries, Alicia K Smith, Stéphanie Le Hellard\",\"doi\":\"10.21203/rs.3.rs-6580295/v1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Genetic ancestry is an important factor to account for in DNA methylation studies because genetic variation influences DNA methylation patterns. One approach uses principal components (PCs) calculated from CpG sites that overlap with common SNPs to adjust for ancestry when genotyping data is not available. However, this method does not remove technical and biological variations, such as sex and age, prior to calculating the PCs. The first PC is therefore often associated with factors other than ancestry.</p><p><strong>Methods: </strong>We developed and adapted the adapted <i>EpiAnceR +</i> approach, which includes 1) residualizing the CpG data overlapping with common SNPs for control probe PCs, sex, age, and cell type proportions to remove the effects of technical and biological factors, and 2) integrating the residualized data with genotype calls from the SNP probes (commonly referred to as rs probes) present on the arrays, before calculating PCs and evaluated the clustering ability and relationship to genetic ancestry.</p><p><strong>Results: </strong>The PCs generated by <i>EpiAnceR +</i> led to improved clustering for repeated samples from the same individual and stronger associations with genetic ancestry groups predicted from genotype information compared to the original approach.</p><p><strong>Conclusions: </strong>We show that the <i>EpiAnceR +</i> approach improves the adjustment for genetic ancestry in DNA methylation studies. <i>EpiAnceR +</i> can be integrated into existing R pipelines for commercial methylation arrays, such as 450K, EPICv1, and EPICv2. The code is available on GitHub (https://github.com/KiraHoeffler/EpiAnceR).</p>\",\"PeriodicalId\":519972,\"journal\":{\"name\":\"Research square\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12136211/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research square\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21203/rs.3.rs-6580295/v1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research square","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21203/rs.3.rs-6580295/v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimizing Genetic Ancestry Adjustment in DNA Methylation Studies: A Comparative Analysis of Approaches.
Background: Genetic ancestry is an important factor to account for in DNA methylation studies because genetic variation influences DNA methylation patterns. One approach uses principal components (PCs) calculated from CpG sites that overlap with common SNPs to adjust for ancestry when genotyping data is not available. However, this method does not remove technical and biological variations, such as sex and age, prior to calculating the PCs. The first PC is therefore often associated with factors other than ancestry.
Methods: We developed and adapted the adapted EpiAnceR + approach, which includes 1) residualizing the CpG data overlapping with common SNPs for control probe PCs, sex, age, and cell type proportions to remove the effects of technical and biological factors, and 2) integrating the residualized data with genotype calls from the SNP probes (commonly referred to as rs probes) present on the arrays, before calculating PCs and evaluated the clustering ability and relationship to genetic ancestry.
Results: The PCs generated by EpiAnceR + led to improved clustering for repeated samples from the same individual and stronger associations with genetic ancestry groups predicted from genotype information compared to the original approach.
Conclusions: We show that the EpiAnceR + approach improves the adjustment for genetic ancestry in DNA methylation studies. EpiAnceR + can be integrated into existing R pipelines for commercial methylation arrays, such as 450K, EPICv1, and EPICv2. The code is available on GitHub (https://github.com/KiraHoeffler/EpiAnceR).