Tatiana Nazarenko, Charlotte Dafni Vavourakis, Allison Jones, Iona Evans, Lena Schreiberhuber, Christine Kastner, Isma Ishaq-Parveen, Elisa Redl, Anthony W. Watson, Kirsten Brandt, Clive Carter, Alexey Zaikin, Chiara Maria Stella Herzog, Martin Widschwendter
{"title":"造成 Illumina 甲基化芯片上 Infinium 探针不可靠的技术和生物学原因","authors":"Tatiana Nazarenko, Charlotte Dafni Vavourakis, Allison Jones, Iona Evans, Lena Schreiberhuber, Christine Kastner, Isma Ishaq-Parveen, Elisa Redl, Anthony W. Watson, Kirsten Brandt, Clive Carter, Alexey Zaikin, Chiara Maria Stella Herzog, Martin Widschwendter","doi":"10.1186/s13148-024-01739-2","DOIUrl":null,"url":null,"abstract":"The Illumina Methylation array platform has facilitated countless epigenetic studies on DNA methylation (DNAme) in health and disease, yet relatively few studies have so studied its reliability, i.e., the consistency of repeated measures. Here we investigate the reliability of both type I and type II Infinium probes. We propose a method for excluding unreliable probes based on dynamic thresholds for mean intensity (MI) and ‘unreliability’, estimated by probe-level simulation of the influence of technical noise on methylation β values using the background intensities of negative control probes. We validate our method in several datasets, including newly generated Illumina MethylationEPIC BeadChip v1.0 data from paired whole blood samples taken six weeks apart and technical replicates spanning multiple sample types. Our analysis revealed that specifically probes with low MI exhibit higher β value variability between repeated samples. MI was associated with the number of C-bases in the respective probe sequence and correlated negatively with unreliability scores. The unreliability scores were substantiated through validation in a new EPIC v1.0 (blood and cervix) and a publicly available 450k (blood) dataset, as they effectively captured the variability observed in β values between technical replicates. Finally, despite promising higher robustness, the newer version v2.0 of the MethylationEPIC BeadChip retained a substantial number of probes with poor unreliability scores. To enhance current pre-processing pipelines, we developed an R package to calculate MI and unreliability scores and provide guidance on establishing optimal dynamic score thresholds for a given dataset.","PeriodicalId":10366,"journal":{"name":"Clinical Epigenetics","volume":"101 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Technical and biological sources of unreliability of Infinium probes on Illumina methylation microarrays\",\"authors\":\"Tatiana Nazarenko, Charlotte Dafni Vavourakis, Allison Jones, Iona Evans, Lena Schreiberhuber, Christine Kastner, Isma Ishaq-Parveen, Elisa Redl, Anthony W. Watson, Kirsten Brandt, Clive Carter, Alexey Zaikin, Chiara Maria Stella Herzog, Martin Widschwendter\",\"doi\":\"10.1186/s13148-024-01739-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Illumina Methylation array platform has facilitated countless epigenetic studies on DNA methylation (DNAme) in health and disease, yet relatively few studies have so studied its reliability, i.e., the consistency of repeated measures. Here we investigate the reliability of both type I and type II Infinium probes. We propose a method for excluding unreliable probes based on dynamic thresholds for mean intensity (MI) and ‘unreliability’, estimated by probe-level simulation of the influence of technical noise on methylation β values using the background intensities of negative control probes. We validate our method in several datasets, including newly generated Illumina MethylationEPIC BeadChip v1.0 data from paired whole blood samples taken six weeks apart and technical replicates spanning multiple sample types. Our analysis revealed that specifically probes with low MI exhibit higher β value variability between repeated samples. MI was associated with the number of C-bases in the respective probe sequence and correlated negatively with unreliability scores. The unreliability scores were substantiated through validation in a new EPIC v1.0 (blood and cervix) and a publicly available 450k (blood) dataset, as they effectively captured the variability observed in β values between technical replicates. Finally, despite promising higher robustness, the newer version v2.0 of the MethylationEPIC BeadChip retained a substantial number of probes with poor unreliability scores. To enhance current pre-processing pipelines, we developed an R package to calculate MI and unreliability scores and provide guidance on establishing optimal dynamic score thresholds for a given dataset.\",\"PeriodicalId\":10366,\"journal\":{\"name\":\"Clinical Epigenetics\",\"volume\":\"101 1\",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Epigenetics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13148-024-01739-2\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Epigenetics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13148-024-01739-2","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
摘要
Illumina甲基化阵列平台促进了无数有关健康和疾病中DNA甲基化(DNAme)的表观遗传学研究,但对其可靠性(即重复测量的一致性)的研究却相对较少。在这里,我们研究了 I 型和 II 型 Infinium 探针的可靠性。我们提出了一种排除不可靠探针的方法,该方法基于平均强度(MI)和 "不可靠 "的动态阈值,通过使用阴性对照探针的背景强度对技术噪音对甲基化β值的影响进行探针级模拟来估算。我们在多个数据集中验证了我们的方法,包括新生成的 Illumina MethylationEPIC BeadChip v1.0 数据,这些数据来自相隔六周的配对全血样本和跨越多种类型样本的技术重复数据。我们的分析表明,MI 低的探针在重复样本间表现出更高的β值变异性。MI 与相应探针序列中的 C 碱基数量相关,并与不可靠评分呈负相关。通过在新的 EPIC v1.0(血液和宫颈)和公开的 450k (血液)数据集中进行验证,不可靠分数得到了证实,因为它们有效地捕捉了技术重复样本之间观察到的β值变异性。最后,尽管MethylationEPIC BeadChip的最新版本v2.0具有更高的鲁棒性,但仍保留了大量不可靠评分较低的探针。为了加强当前的预处理管道,我们开发了一个 R 软件包来计算 MI 和不可靠度得分,并为给定数据集建立最佳动态得分阈值提供指导。
Technical and biological sources of unreliability of Infinium probes on Illumina methylation microarrays
The Illumina Methylation array platform has facilitated countless epigenetic studies on DNA methylation (DNAme) in health and disease, yet relatively few studies have so studied its reliability, i.e., the consistency of repeated measures. Here we investigate the reliability of both type I and type II Infinium probes. We propose a method for excluding unreliable probes based on dynamic thresholds for mean intensity (MI) and ‘unreliability’, estimated by probe-level simulation of the influence of technical noise on methylation β values using the background intensities of negative control probes. We validate our method in several datasets, including newly generated Illumina MethylationEPIC BeadChip v1.0 data from paired whole blood samples taken six weeks apart and technical replicates spanning multiple sample types. Our analysis revealed that specifically probes with low MI exhibit higher β value variability between repeated samples. MI was associated with the number of C-bases in the respective probe sequence and correlated negatively with unreliability scores. The unreliability scores were substantiated through validation in a new EPIC v1.0 (blood and cervix) and a publicly available 450k (blood) dataset, as they effectively captured the variability observed in β values between technical replicates. Finally, despite promising higher robustness, the newer version v2.0 of the MethylationEPIC BeadChip retained a substantial number of probes with poor unreliability scores. To enhance current pre-processing pipelines, we developed an R package to calculate MI and unreliability scores and provide guidance on establishing optimal dynamic score thresholds for a given dataset.
期刊介绍:
Clinical Epigenetics, the official journal of the Clinical Epigenetics Society, is an open access, peer-reviewed journal that encompasses all aspects of epigenetic principles and mechanisms in relation to human disease, diagnosis and therapy. Clinical trials and research in disease model organisms are particularly welcome.