Iñaki Sasiain, Deborah F Nacer, Mattias Aine, Srinivas Veerla, Johan Staaf
{"title":"Tumor purity estimated from bulk DNA methylation can be used for adjusting beta values of individual samples to better reflect tumor biology.","authors":"Iñaki Sasiain, Deborah F Nacer, Mattias Aine, Srinivas Veerla, Johan Staaf","doi":"10.1093/nargab/lqae146","DOIUrl":null,"url":null,"abstract":"<p><p>Epigenetic deregulation through altered DNA methylation is a fundamental feature of tumorigenesis, but tumor data from bulk tissue samples contain different proportions of malignant and non-malignant cells that may confound the interpretation of DNA methylation values. The adjustment of DNA methylation data based on tumor purity has been proposed to render both genome-wide and gene-specific analyses more precise, but it requires sample purity estimates. Here we present PureBeta, a single-sample statistical framework that uses genome-wide DNA methylation data to first estimate sample purity and then adjust methylation values of individual CpGs to correct for sample impurity. Purity values estimated with the algorithm have high correlation (>0.8) to reference values obtained from DNA sequencing when applied to samples from breast carcinoma, lung adenocarcinoma, and lung squamous cell carcinoma. Methylation beta values adjusted based on purity estimates have a more binary distribution that better reflects theoretical methylation states, thus facilitating improved biological inference as shown for <i>BRCA1</i> in breast cancer. PureBeta is a versatile tool that can be used for different Illumina DNA methylation arrays and can be applied to individual samples of different cancer types to enhance biological interpretability of methylation data.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11532792/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqae146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Epigenetic deregulation through altered DNA methylation is a fundamental feature of tumorigenesis, but tumor data from bulk tissue samples contain different proportions of malignant and non-malignant cells that may confound the interpretation of DNA methylation values. The adjustment of DNA methylation data based on tumor purity has been proposed to render both genome-wide and gene-specific analyses more precise, but it requires sample purity estimates. Here we present PureBeta, a single-sample statistical framework that uses genome-wide DNA methylation data to first estimate sample purity and then adjust methylation values of individual CpGs to correct for sample impurity. Purity values estimated with the algorithm have high correlation (>0.8) to reference values obtained from DNA sequencing when applied to samples from breast carcinoma, lung adenocarcinoma, and lung squamous cell carcinoma. Methylation beta values adjusted based on purity estimates have a more binary distribution that better reflects theoretical methylation states, thus facilitating improved biological inference as shown for BRCA1 in breast cancer. PureBeta is a versatile tool that can be used for different Illumina DNA methylation arrays and can be applied to individual samples of different cancer types to enhance biological interpretability of methylation data.
根据大量 DNA 甲基化估计的肿瘤纯度可用于调整单个样本的 beta 值,以更好地反映肿瘤生物学特性。
通过改变 DNA 甲基化实现表观遗传学失调是肿瘤发生的一个基本特征,但来自大量组织样本的肿瘤数据包含不同比例的恶性和非恶性细胞,这可能会混淆 DNA 甲基化值的解释。有人提出根据肿瘤纯度调整 DNA 甲基化数据,使全基因组和基因特异性分析更加精确,但这需要对样本纯度进行估计。在这里,我们介绍一种单样本统计框架 PureBeta,它使用全基因组 DNA 甲基化数据首先估算样本纯度,然后调整单个 CpGs 的甲基化值以校正样本不纯度。在应用于乳腺癌、肺腺癌和肺鳞癌样本时,用该算法估算的纯度值与 DNA 测序获得的参考值具有很高的相关性(>0.8)。根据纯度估计值调整的甲基化贝塔值具有更二元的分布,能更好地反映理论上的甲基化状态,从而有助于改进生物学推断,如乳腺癌中 BRCA1 的情况所示。PureBeta 是一种多功能工具,可用于不同的 Illumina DNA 甲基化阵列,并可应用于不同癌症类型的个体样本,以提高甲基化数据的生物学可解释性。