利用血浆细胞游离 DNA 羟甲基组对表皮生长因子受体突变的非小细胞肺癌的疾病状态进行表观基因组学探索。

IF 20.1 1区 医学 Q1 ONCOLOGY
Yong Peng, Jason Karpus, Jyoti D. Patel, Everett E. Vokes, Marina Chiara Garassino, Kirsteen Lugtu, Zhou Zhang, Wei Zhang, Mengjie Chen, Chuan He, Christine M. Bestvina
{"title":"利用血浆细胞游离 DNA 羟甲基组对表皮生长因子受体突变的非小细胞肺癌的疾病状态进行表观基因组学探索。","authors":"Yong Peng,&nbsp;Jason Karpus,&nbsp;Jyoti D. Patel,&nbsp;Everett E. Vokes,&nbsp;Marina Chiara Garassino,&nbsp;Kirsteen Lugtu,&nbsp;Zhou Zhang,&nbsp;Wei Zhang,&nbsp;Mengjie Chen,&nbsp;Chuan He,&nbsp;Christine M. Bestvina","doi":"10.1002/cac2.12606","DOIUrl":null,"url":null,"abstract":"<p>Non-small cell lung cancer (NSCLC) represents about 85% of histological diagnoses of lung cancer [<span>1</span>]. Epidermal growth factor receptor (<i>EGFR</i>) mutations occur in 12.7%-40.3% of NSCLC [<span>2</span>], and 5-hydroxymethylcytosine (5hmC) signatures and pathways can be inhibited by EGFR signaling [<span>3</span>]. The epigenome of plasma cell-free DNA (cfDNA), including 5hmC, has demonstrated promise as a cancer biomarker [<span>4</span>]. Currently, it remains unknown whether cfDNA 5hmC can identify disease status of NSCLC. Here, we performed 5hmC Seal-sequencing of 302 plasma cfDNA samples from 113 patients with metastatic <i>EGFR</i>-mutated NSCLC, which included 240 samples reflecting stable disease (SD) and 62 samples reflecting progressive disease (PD) (Figure 1A, Supplementary Table S1). SD and PD were clinically defined by the treating physician (Supplementary Methods).</p><p>High quality was ensured, 11 samples as outliers were discarded, and batch effects were removed effectively (Supplementary Figures S1, Supplementary Tables S2-S3). The remaining 291 samples were classified by disease status and various potential confounding factors (Figure 1A, Supplementary Tables S4-S7). The relative frequency of disease status in each group was nearly identical to that of the overall 291 samples (Supplementary Figure S4). cfDNA 5hmC peaks of each sample displayed proper reproducibility (Supplementary Figure S5A). Interestingly, 123 cfDNA 5hmC peaks were located on the <i>EGFR</i> gene (Supplementary Figure S5B, Supplementary Table S8). Genomewide cfDNA 5hmC levels were overall similar between PD and SD samples, as well as various potential confounders (Supplementary Figures S5C-E and S6).</p><p>A substantial portion of 5hmC peaks displayed high heterogeneity of 5hmC levels among the 291 samples (Supplementary Figure S7A), which were not derived from disease status and potential confounders (Supplementary Figure S7B-E, Supplementary Table S9). With 1,000 bp bins instead of peaks, similar results were observed (Supplementary Figure S8). We found that <i>EGFR</i> mutations were associated with 5hmC heterogeneity (Supplementary Figure S9A) and identified 4,743 cfDNA 5hmC peaks (Supplementary Table S10) with 5hmC levels differing among intergroups of <i>EGFR</i> mutation subtypes more than that of intragroups (<i>P</i> &lt; 0.005) (Supplementary Figure S9B). Interestingly, the 4,743 cfDNA 5hmC peaks were strongly associated with the function of <i>EGFR</i> (Supplementary Figure S10A), but not associated with disease status (Supplementary Figure 10B-E). This result was further confirmed by a nearly identical 5hmC level between PD and SD samples (Supplementary Figure 10F), as well as distribution of false discovery rate and <i>P</i> values (Figure 1B).</p><p>Disease status-associated 5hmC peaks were completely different from potential confounder-associated 5hmC peaks, except for smoking status (Supplementary Figure S11). Consistently, 5hmC levels of SD and PD samples were significantly different on smoking status-associated peaks, but not on sex-, age-, or race-associated peaks (Supplementary Figure S12A). Comparisons between either two of the three smoking statuses or between the two disease statuses shared 123, 282, 106, and 58 differential 5hmC peaks, respectively (Supplementary Figure S12B-C, Supplementary Tables S11-S14). The shared 4 groups of 5hmC peaks showed differences of 5hmC levels between PD and SD samples (Figure 1C), and can classify both disease statuses and smoking statuses (Figure 1D, Supplementary Figure S13D-E). Overall, although 5hmC levels varied based on patients’ characteristics, only smoking status affected disease status-associated 5hmC.</p><p>The hyper- or hypo-hydroxymethylated 5hmC peaks from PD versus SD samples (Supplementary Tables S15-S16) could not identify subtypes of sex, race, age, smoking status, or <i>EGFR</i> mutation (Figure 1E, Supplementary Figure S13A-C). They were correlated only with disease status, but not the potential confounders (Figure 1F, Supplementary Figure S13D-E). Functional enrichment analysis showed that the hyper-5hmC peaks were closely associated with lung development, vital capacity, and smoking (Supplementary Figure S14A-C). Interestingly, the hypo-5hmC peaks were not associated with lung function directly but may affect the disease status through the immune system, such as T cell activation, leukocyte adhesion, and lgM levels (Supplementary Figure S14D-E). Like the hyper-5hmC peaks, the hypo-5hmC peaks were also associated with smoking behaviors and forced expiratory volume (Supplementary Figure S14F).</p><p>The lung function- and immune system-associated 5hmC peaks were mainly located on gene bodies, but not intergenic (Figure 1G), such as a hyper-5hmC peak at the intron of thyroid hormone receptor beta (<i>THRB)</i> gene (Supplementary Figure S15A) which regulates lung development [<span>5</span>]. Some important lung function-associated genes were hyper-hydroxymethylated (Figure 1H, Supplementary Figure S15B), whereas hypo-5hmC peaks were located on the gene body of immune-associated genes (Supplementary Figure S15C, Supplementary Table S17). Regulatory elements and lung enhancers were enriched in the gene body, promoter, or intergenic regions of the hyper- or hypo-5hmC peaks (Figure 1I, Supplementary Figure S15D). Motifs and binding regions of some lung function-associated transcription factors (TFs) were also enriched in the hyper-5hmC peaks (Figure 1J, Supplementary Figure S15E). Taken together, disease status-dependent and patient characteristics-independent cfDNA 5hmC peaks can be linked to lung development, smoking behavior, and immune response, as well as lung function-associated enhancers and TF-binding sites (Supplementary Figure S16).</p><p>We optimized 888 peaks (Supplementary Table S18) from the differential 5hmC peaks to build a logistic regression model with an area under the receiver operating characteristic curve (AUC) of 0.998 using appropriate cutoffs of the output probabilities (Figure 1K, Supplementary Figure S17A-B). Based on the 888 peaks, unsupervised clustering could discriminate PD and SD samples with 100% accuracy, while not being able to discriminate different groups from sex, race, age, smoking status, or <i>EGFR</i> mutation subtypes (Figure 1L). The AUC of the model for predicting disease status was much greater than those for classifying age, sex, race, or smoking status (Figure 1M, Supplementary Figure S17C). Our cfDNA 5hmC-based logistic regression model could discriminate disease status accurately, sensitively, and specifically, and was independent of potential confounding factors in NSCLC (Figure 1M, Supplementary Figure S17D-E). The 888 peaks could not distinguish the 10 treatment-naïve samples and the 49 previously treated samples (Supplementary Figure S17F-G).</p><p>As expected, most of the 888 peaks were located at gene bodies, and multiple peaks might locate on the same gene (Figure 1N). Interestingly, three genes with three of the 888 peaks (Figure 1N) were strongly associated with lung function and lung cancer [<span>6, 7</span>], and highly expressed in various cancers including lung cancer (Supplementary Figure S18A). More importantly, lung cancer patients with different expression levels of the retinoic acid-induced 14 (<i>RAI14</i> or <i>NORPEG</i>) gene demonstrated a 16% difference of survival probability (Supplementary Figure S18B). Furthermore, some of the 30 genes with t peaks (Figure 1N) were associated with lung function such as leptin receptor (<i>LEPR)</i> and f-box and leucine rich repeat protein 7 (<i>FBXL7)</i> [<span>8, 9</span>], survival probability of patients with lung cancer (Supplementary Figure S18C-D), and exhibited high expression level in lung cancer (Supplementary Figure S18E). To determine 5hmC biomarkers for disease status, the top 63 cfDNA 5hmC peaks (Supplementary Table S19) with maximum absolute values of logistic regression coefficients were selected from the 888 peaks. The 63-5hmC peak-based logistic model could also achieve high performance (Figure 1O, Supplementary Figure S19A-B). Some of the 63-peak-associated genes not only played an important role for lung function but also correlated with lung cancer survival probability (Supplementary Figure S19C-E).</p><p>Overall, we found that smoking status affected disease status-associated cfDNA 5hmC. We unveiled that lung function and regulatory elements were enriched in disease status-associated 5hmC peaks which could discriminate progressive and stable NSCLC with high sensitivity and specificity. Our results conferred the epigenomic distinguishability of different treatment responses and nominated cfDNA 5hmC profiling as a non-invasive, cost-effective, and universally applicable approach to monitor disease status.</p><p>C.M.B. reports research funding to the institution from AstraZeneca and BMS; advisory boards and personal consulting payments from Amgen, AstraZeneca, BMS, CVS, Daiichi Sankyo, EMD Serono, Gilead, Guardant, JNJ, Mirati, Novocure, Sanofi, Tempus and Turning Point Therapeutics. M.C.G. reports funding to the institution from Eli Lilly, MSD, Pfizer (MISP); AstraZeneca, MSD International GmbH, BMS, Boehringer Ingelheim Italia S.p.A, Celgene, Eli Lilly, Ignyta, Incyte, MedImmune, Novartis, Pfizer, Roche, Takeda, Tiziana, Foundation Medicine, Glaxo Smith Kline GSK, Spectrum pharmaceuticals. MCG reports advisory boards and personal consulting payments from AstraZeneca, MSD International GmbH, Bayer, BMS, Boehringer Ingelheim Italia S.p.A, Celgene, Eli Lilly, Incyte, Novartis, Pfizer, Roche, Takeda, Seattle Genetics, Mirati, Daiichi Sankyo, Regeneron, Merck, Blueprint, Jansenn, Sanofi, AbbVie, BeiGenius, Oncohost. The remaining authors report no competing interests.</p><p>No funding was received for this project.</p><p>This study was approved by the local Institutional Review Board according to the U.S. Common Rule ethical guidelines. All patients were consented to a general thoracic biobanking study under IRB 18-1319, which allowed for utilization of samples collected under IRB 9571.</p>","PeriodicalId":9495,"journal":{"name":"Cancer Communications","volume":"45 1","pages":"51-55"},"PeriodicalIF":20.1000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758164/pdf/","citationCount":"0","resultStr":"{\"title\":\"Epigenomic exploration of disease status of EGFR-mutated non-small cell lung cancer using plasma cell-free DNA hydroxymethylomes\",\"authors\":\"Yong Peng,&nbsp;Jason Karpus,&nbsp;Jyoti D. Patel,&nbsp;Everett E. Vokes,&nbsp;Marina Chiara Garassino,&nbsp;Kirsteen Lugtu,&nbsp;Zhou Zhang,&nbsp;Wei Zhang,&nbsp;Mengjie Chen,&nbsp;Chuan He,&nbsp;Christine M. Bestvina\",\"doi\":\"10.1002/cac2.12606\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Non-small cell lung cancer (NSCLC) represents about 85% of histological diagnoses of lung cancer [<span>1</span>]. Epidermal growth factor receptor (<i>EGFR</i>) mutations occur in 12.7%-40.3% of NSCLC [<span>2</span>], and 5-hydroxymethylcytosine (5hmC) signatures and pathways can be inhibited by EGFR signaling [<span>3</span>]. The epigenome of plasma cell-free DNA (cfDNA), including 5hmC, has demonstrated promise as a cancer biomarker [<span>4</span>]. Currently, it remains unknown whether cfDNA 5hmC can identify disease status of NSCLC. Here, we performed 5hmC Seal-sequencing of 302 plasma cfDNA samples from 113 patients with metastatic <i>EGFR</i>-mutated NSCLC, which included 240 samples reflecting stable disease (SD) and 62 samples reflecting progressive disease (PD) (Figure 1A, Supplementary Table S1). SD and PD were clinically defined by the treating physician (Supplementary Methods).</p><p>High quality was ensured, 11 samples as outliers were discarded, and batch effects were removed effectively (Supplementary Figures S1, Supplementary Tables S2-S3). The remaining 291 samples were classified by disease status and various potential confounding factors (Figure 1A, Supplementary Tables S4-S7). The relative frequency of disease status in each group was nearly identical to that of the overall 291 samples (Supplementary Figure S4). cfDNA 5hmC peaks of each sample displayed proper reproducibility (Supplementary Figure S5A). Interestingly, 123 cfDNA 5hmC peaks were located on the <i>EGFR</i> gene (Supplementary Figure S5B, Supplementary Table S8). Genomewide cfDNA 5hmC levels were overall similar between PD and SD samples, as well as various potential confounders (Supplementary Figures S5C-E and S6).</p><p>A substantial portion of 5hmC peaks displayed high heterogeneity of 5hmC levels among the 291 samples (Supplementary Figure S7A), which were not derived from disease status and potential confounders (Supplementary Figure S7B-E, Supplementary Table S9). With 1,000 bp bins instead of peaks, similar results were observed (Supplementary Figure S8). We found that <i>EGFR</i> mutations were associated with 5hmC heterogeneity (Supplementary Figure S9A) and identified 4,743 cfDNA 5hmC peaks (Supplementary Table S10) with 5hmC levels differing among intergroups of <i>EGFR</i> mutation subtypes more than that of intragroups (<i>P</i> &lt; 0.005) (Supplementary Figure S9B). Interestingly, the 4,743 cfDNA 5hmC peaks were strongly associated with the function of <i>EGFR</i> (Supplementary Figure S10A), but not associated with disease status (Supplementary Figure 10B-E). This result was further confirmed by a nearly identical 5hmC level between PD and SD samples (Supplementary Figure 10F), as well as distribution of false discovery rate and <i>P</i> values (Figure 1B).</p><p>Disease status-associated 5hmC peaks were completely different from potential confounder-associated 5hmC peaks, except for smoking status (Supplementary Figure S11). Consistently, 5hmC levels of SD and PD samples were significantly different on smoking status-associated peaks, but not on sex-, age-, or race-associated peaks (Supplementary Figure S12A). Comparisons between either two of the three smoking statuses or between the two disease statuses shared 123, 282, 106, and 58 differential 5hmC peaks, respectively (Supplementary Figure S12B-C, Supplementary Tables S11-S14). The shared 4 groups of 5hmC peaks showed differences of 5hmC levels between PD and SD samples (Figure 1C), and can classify both disease statuses and smoking statuses (Figure 1D, Supplementary Figure S13D-E). Overall, although 5hmC levels varied based on patients’ characteristics, only smoking status affected disease status-associated 5hmC.</p><p>The hyper- or hypo-hydroxymethylated 5hmC peaks from PD versus SD samples (Supplementary Tables S15-S16) could not identify subtypes of sex, race, age, smoking status, or <i>EGFR</i> mutation (Figure 1E, Supplementary Figure S13A-C). They were correlated only with disease status, but not the potential confounders (Figure 1F, Supplementary Figure S13D-E). Functional enrichment analysis showed that the hyper-5hmC peaks were closely associated with lung development, vital capacity, and smoking (Supplementary Figure S14A-C). Interestingly, the hypo-5hmC peaks were not associated with lung function directly but may affect the disease status through the immune system, such as T cell activation, leukocyte adhesion, and lgM levels (Supplementary Figure S14D-E). Like the hyper-5hmC peaks, the hypo-5hmC peaks were also associated with smoking behaviors and forced expiratory volume (Supplementary Figure S14F).</p><p>The lung function- and immune system-associated 5hmC peaks were mainly located on gene bodies, but not intergenic (Figure 1G), such as a hyper-5hmC peak at the intron of thyroid hormone receptor beta (<i>THRB)</i> gene (Supplementary Figure S15A) which regulates lung development [<span>5</span>]. Some important lung function-associated genes were hyper-hydroxymethylated (Figure 1H, Supplementary Figure S15B), whereas hypo-5hmC peaks were located on the gene body of immune-associated genes (Supplementary Figure S15C, Supplementary Table S17). Regulatory elements and lung enhancers were enriched in the gene body, promoter, or intergenic regions of the hyper- or hypo-5hmC peaks (Figure 1I, Supplementary Figure S15D). Motifs and binding regions of some lung function-associated transcription factors (TFs) were also enriched in the hyper-5hmC peaks (Figure 1J, Supplementary Figure S15E). Taken together, disease status-dependent and patient characteristics-independent cfDNA 5hmC peaks can be linked to lung development, smoking behavior, and immune response, as well as lung function-associated enhancers and TF-binding sites (Supplementary Figure S16).</p><p>We optimized 888 peaks (Supplementary Table S18) from the differential 5hmC peaks to build a logistic regression model with an area under the receiver operating characteristic curve (AUC) of 0.998 using appropriate cutoffs of the output probabilities (Figure 1K, Supplementary Figure S17A-B). Based on the 888 peaks, unsupervised clustering could discriminate PD and SD samples with 100% accuracy, while not being able to discriminate different groups from sex, race, age, smoking status, or <i>EGFR</i> mutation subtypes (Figure 1L). The AUC of the model for predicting disease status was much greater than those for classifying age, sex, race, or smoking status (Figure 1M, Supplementary Figure S17C). Our cfDNA 5hmC-based logistic regression model could discriminate disease status accurately, sensitively, and specifically, and was independent of potential confounding factors in NSCLC (Figure 1M, Supplementary Figure S17D-E). The 888 peaks could not distinguish the 10 treatment-naïve samples and the 49 previously treated samples (Supplementary Figure S17F-G).</p><p>As expected, most of the 888 peaks were located at gene bodies, and multiple peaks might locate on the same gene (Figure 1N). Interestingly, three genes with three of the 888 peaks (Figure 1N) were strongly associated with lung function and lung cancer [<span>6, 7</span>], and highly expressed in various cancers including lung cancer (Supplementary Figure S18A). More importantly, lung cancer patients with different expression levels of the retinoic acid-induced 14 (<i>RAI14</i> or <i>NORPEG</i>) gene demonstrated a 16% difference of survival probability (Supplementary Figure S18B). Furthermore, some of the 30 genes with t peaks (Figure 1N) were associated with lung function such as leptin receptor (<i>LEPR)</i> and f-box and leucine rich repeat protein 7 (<i>FBXL7)</i> [<span>8, 9</span>], survival probability of patients with lung cancer (Supplementary Figure S18C-D), and exhibited high expression level in lung cancer (Supplementary Figure S18E). To determine 5hmC biomarkers for disease status, the top 63 cfDNA 5hmC peaks (Supplementary Table S19) with maximum absolute values of logistic regression coefficients were selected from the 888 peaks. The 63-5hmC peak-based logistic model could also achieve high performance (Figure 1O, Supplementary Figure S19A-B). Some of the 63-peak-associated genes not only played an important role for lung function but also correlated with lung cancer survival probability (Supplementary Figure S19C-E).</p><p>Overall, we found that smoking status affected disease status-associated cfDNA 5hmC. We unveiled that lung function and regulatory elements were enriched in disease status-associated 5hmC peaks which could discriminate progressive and stable NSCLC with high sensitivity and specificity. Our results conferred the epigenomic distinguishability of different treatment responses and nominated cfDNA 5hmC profiling as a non-invasive, cost-effective, and universally applicable approach to monitor disease status.</p><p>C.M.B. reports research funding to the institution from AstraZeneca and BMS; advisory boards and personal consulting payments from Amgen, AstraZeneca, BMS, CVS, Daiichi Sankyo, EMD Serono, Gilead, Guardant, JNJ, Mirati, Novocure, Sanofi, Tempus and Turning Point Therapeutics. M.C.G. reports funding to the institution from Eli Lilly, MSD, Pfizer (MISP); AstraZeneca, MSD International GmbH, BMS, Boehringer Ingelheim Italia S.p.A, Celgene, Eli Lilly, Ignyta, Incyte, MedImmune, Novartis, Pfizer, Roche, Takeda, Tiziana, Foundation Medicine, Glaxo Smith Kline GSK, Spectrum pharmaceuticals. MCG reports advisory boards and personal consulting payments from AstraZeneca, MSD International GmbH, Bayer, BMS, Boehringer Ingelheim Italia S.p.A, Celgene, Eli Lilly, Incyte, Novartis, Pfizer, Roche, Takeda, Seattle Genetics, Mirati, Daiichi Sankyo, Regeneron, Merck, Blueprint, Jansenn, Sanofi, AbbVie, BeiGenius, Oncohost. The remaining authors report no competing interests.</p><p>No funding was received for this project.</p><p>This study was approved by the local Institutional Review Board according to the U.S. Common Rule ethical guidelines. All patients were consented to a general thoracic biobanking study under IRB 18-1319, which allowed for utilization of samples collected under IRB 9571.</p>\",\"PeriodicalId\":9495,\"journal\":{\"name\":\"Cancer Communications\",\"volume\":\"45 1\",\"pages\":\"51-55\"},\"PeriodicalIF\":20.1000,\"publicationDate\":\"2024-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758164/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer Communications\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cac2.12606\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Communications","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cac2.12606","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

非小细胞肺癌(NSCLC)约占肺癌组织学诊断的85%。表皮生长因子受体(EGFR)突变发生在12.7%-40.3%的NSCLC[3]中,5-羟甲基胞嘧啶(5hmC)信号和通路可被EGFR信号通路[3]抑制。血浆无细胞DNA (cfDNA)的表观基因组,包括5hmC,已经证明了作为癌症生物标志物[4]的前景。目前,cfDNA 5hmC能否识别NSCLC的疾病状态尚不清楚。在这里,我们对113例转移性egfr突变的NSCLC患者的302份血浆cfDNA样本进行了5hmC测序,其中240份样本反映了稳定的疾病(SD), 62份样本反映了进展性疾病(PD)(图1A,补充表S1)。SD和PD由治疗医师临床定义(补充方法)。保证了高质量,丢弃了11个异常值样本,有效地消除了批效应(补充图S1,补充表S2-S3)。其余291份样本根据疾病状态和各种潜在混杂因素进行分类(图1A,补充表S4-S7)。每组中疾病状态的相对频率几乎与总体291个样本相同(补充图S4)。每个样品的cfDNA 5hmC峰显示出适当的再现性(补充图S5A)。有趣的是,123个cfDNA 5hmC峰位于EGFR基因上(Supplementary Figure S5B, Supplementary Table S8)。全基因组cfDNA 5hmC水平在PD和SD样本之间以及各种潜在混杂因素之间总体相似(补充图S5C-E和S6)。在291个样本中,很大一部分5hmC峰显示出5hmC水平的高度异质性(补充图S7A),这不是来自疾病状态和潜在的混杂因素(补充图S7B-E,补充表S9)。用1000 bp的箱子代替峰值,观察到类似的结果(补充图S8)。我们发现EGFR突变与5hmC异质性相关(补充图S9A),并鉴定出4,743个cfDNA 5hmC峰(补充表S10), EGFR突变亚型组间的5hmC水平差异大于组内(P &lt;0.005)(补充图S9B)。有趣的是,4743个cfDNA 5hmC峰与EGFR功能密切相关(补充图S10A),但与疾病状态无关(补充图10B-E)。PD和SD样本之间几乎相同的5hmC水平(Supplementary图10F),以及错误发现率和P值的分布(图1B)进一步证实了这一结果。除了吸烟状态外,疾病状态相关的5hmC峰值与潜在混杂因素相关的5hmC峰值完全不同(补充图S11)。一致地,SD和PD样本的5hmC水平在吸烟状态相关的峰值上有显著差异,但在性别、年龄或种族相关的峰值上没有显著差异(补充图S12A)。三种吸烟状态或两种疾病状态之间的比较分别有123、282、106和58个不同的5hmC峰(补充图S12B-C,补充表S11-S14)。共有的4组5hmC峰显示了PD和SD样本之间5hmC水平的差异(图1C),可以区分疾病状态和吸烟状态(图1D,补充图S13D-E)。总体而言,尽管5hmC水平因患者特征而异,但只有吸烟状况影响与疾病状态相关的5hmC。PD与SD样本的高羟甲基化或低羟甲基化5hmC峰(补充表S15-S16)不能识别性别、种族、年龄、吸烟状况或EGFR突变的亚型(图1E,补充图S13A-C)。它们仅与疾病状态相关,而与潜在的混杂因素无关(图1F,补充图S13D-E)。功能富集分析显示,hyper-5hmC峰值与肺发育、肺活量和吸烟密切相关(Supplementary Figure S14A-C)。有趣的是,低5hmc峰值与肺功能没有直接关系,但可能通过免疫系统影响疾病状态,如T细胞活化、白细胞粘附和lgM水平(Supplementary Figure S14D-E)。与高- 5hmc峰一样,低- 5hmc峰也与吸烟行为和用力呼气量有关(Supplementary Figure S14F)。与肺功能和免疫系统相关的5hmC峰主要位于基因体上,而非基因间(图1G),如调节肺发育[5]的甲状腺激素受体β (THRB)基因内含子处有一个超5hmC峰(Supplementary图S15A)。 一些重要的肺功能相关基因被超羟甲基化(图1H,补充图S15B),而低羟甲基化峰位于免疫相关基因的基因体上(补充图S15C,补充表S17)。调控元件和肺增强子富集于基因体、启动子或高或低5hmc峰的基因间区域(图1I,补充图S15D)。一些肺功能相关转录因子(TFs)的基序和结合区也在hyper-5hmC峰中富集(图1J,补充图S15E)。综上所述,疾病状态依赖和患者特征独立的cfDNA 5hmC峰值可与肺发育、吸烟行为、免疫反应以及肺功能相关的增强子和tf结合位点相关(补充图S16)。我们从5hmC的差异峰中优化了888个峰(补充表S18),利用输出概率的适当截止点建立了一个接收者工作特征曲线下面积为0.998的逻辑回归模型(图1K,补充图S17A-B)。基于888个峰值,无监督聚类可以100%准确地区分PD和SD样本,而不能区分性别、种族、年龄、吸烟状况或EGFR突变亚型(图1L)。预测疾病状态模型的AUC远大于对年龄、性别、种族或吸烟状况进行分类的模型(图1M,补充图S17C)。我们基于cfDNA 5hmc的logistic回归模型能够准确、敏感、特异地区分非小细胞肺癌的疾病状态,并且独立于潜在的混杂因素(图1M,补充图S17D-E)。888个峰无法区分10个treatment-naïve样品和49个先前处理过的样品(补充图S17F-G)。正如预期的那样,888个峰大部分位于基因体上,并且多个峰可能位于同一基因上(图1N)。有趣的是,888个峰中有三个峰的三个基因(图1N)与肺功能和肺癌密切相关[6,7],并在包括肺癌在内的各种癌症中高表达(Supplementary Figure S18A)。更重要的是,不同维黄酸诱导的14 (RAI14或NORPEG)基因表达水平的肺癌患者生存率差异为16% (Supplementary Figure S18B)。此外,30个t峰基因(图1N)中部分基因与肺功能相关,如瘦素受体(LEPR)、f-box和亮氨酸富重复蛋白7 (FBXL7)[8,9]、肺癌患者生存率(Supplementary Figure S18C-D),在肺癌中表现出高表达水平(Supplementary Figure S18E)。为了确定疾病状态的5hmC生物标志物,从888个峰中选择逻辑回归系数绝对值最大的前63个cfDNA 5hmC峰(补充表S19)。63-5hmC基于峰值的logistic模型也可以达到很高的性能(图10,补充图S19A-B)。部分63峰相关基因不仅对肺功能起重要作用,而且与肺癌生存率相关(Supplementary Figure S19C-E)。总的来说,我们发现吸烟状况影响疾病状态相关的cfDNA 5hmC。我们发现在疾病状态相关的5hmC峰中富集了肺功能和调控元件,这可以以高灵敏度和特异性区分进展性和稳定性NSCLC。我们的研究结果赋予了不同治疗反应的表观基因组可区分性,并将cfDNA 5hmC谱作为一种无创、成本效益高、普遍适用的疾病状态监测方法。报告阿斯利康和BMS向该机构提供的研究资金;来自安进、阿斯利康、BMS、CVS、Daiichi Sankyo、EMD Serono、吉利德、Guardant、强生、Mirati、Novocure、赛诺菲、Tempus和Turning Point Therapeutics的顾问委员会和个人咨询费用。M.C.G.报告了Eli Lilly, MSD, Pfizer (MISP)对该机构的资助;阿斯利康、MSD国际有限公司、BMS、勃林格殷格翰意大利公司、新基、礼来、伊格尼塔、Incyte、MedImmune、诺华、辉瑞、罗氏、武田、Tiziana、基础医药、葛兰素史克、Spectrum制药公司。MCG报告了来自阿斯利康、MSD国际有限公司、拜耳、BMS、Boehringer Ingelheim Italia S.p.A、Celgene、礼来、Incyte、诺华、辉瑞、罗氏、武田、西雅图基因、Mirati、Daiichi Sankyo、Regeneron、默克、Blueprint、Jansenn、赛诺菲、艾伯维、BeiGenius、Oncohost的顾问委员会和个人咨询费用。其余的作者报告没有竞争利益。这个项目没有收到任何资金。根据美国通用规则伦理准则,本研究得到了当地机构审查委员会的批准。 所有患者均同意在IRB 18-1319下进行一般胸部生物银行研究,该研究允许利用IRB 9571下收集的样本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Epigenomic exploration of disease status of EGFR-mutated non-small cell lung cancer using plasma cell-free DNA hydroxymethylomes

Epigenomic exploration of disease status of EGFR-mutated non-small cell lung cancer using plasma cell-free DNA hydroxymethylomes

Non-small cell lung cancer (NSCLC) represents about 85% of histological diagnoses of lung cancer [1]. Epidermal growth factor receptor (EGFR) mutations occur in 12.7%-40.3% of NSCLC [2], and 5-hydroxymethylcytosine (5hmC) signatures and pathways can be inhibited by EGFR signaling [3]. The epigenome of plasma cell-free DNA (cfDNA), including 5hmC, has demonstrated promise as a cancer biomarker [4]. Currently, it remains unknown whether cfDNA 5hmC can identify disease status of NSCLC. Here, we performed 5hmC Seal-sequencing of 302 plasma cfDNA samples from 113 patients with metastatic EGFR-mutated NSCLC, which included 240 samples reflecting stable disease (SD) and 62 samples reflecting progressive disease (PD) (Figure 1A, Supplementary Table S1). SD and PD were clinically defined by the treating physician (Supplementary Methods).

High quality was ensured, 11 samples as outliers were discarded, and batch effects were removed effectively (Supplementary Figures S1, Supplementary Tables S2-S3). The remaining 291 samples were classified by disease status and various potential confounding factors (Figure 1A, Supplementary Tables S4-S7). The relative frequency of disease status in each group was nearly identical to that of the overall 291 samples (Supplementary Figure S4). cfDNA 5hmC peaks of each sample displayed proper reproducibility (Supplementary Figure S5A). Interestingly, 123 cfDNA 5hmC peaks were located on the EGFR gene (Supplementary Figure S5B, Supplementary Table S8). Genomewide cfDNA 5hmC levels were overall similar between PD and SD samples, as well as various potential confounders (Supplementary Figures S5C-E and S6).

A substantial portion of 5hmC peaks displayed high heterogeneity of 5hmC levels among the 291 samples (Supplementary Figure S7A), which were not derived from disease status and potential confounders (Supplementary Figure S7B-E, Supplementary Table S9). With 1,000 bp bins instead of peaks, similar results were observed (Supplementary Figure S8). We found that EGFR mutations were associated with 5hmC heterogeneity (Supplementary Figure S9A) and identified 4,743 cfDNA 5hmC peaks (Supplementary Table S10) with 5hmC levels differing among intergroups of EGFR mutation subtypes more than that of intragroups (P < 0.005) (Supplementary Figure S9B). Interestingly, the 4,743 cfDNA 5hmC peaks were strongly associated with the function of EGFR (Supplementary Figure S10A), but not associated with disease status (Supplementary Figure 10B-E). This result was further confirmed by a nearly identical 5hmC level between PD and SD samples (Supplementary Figure 10F), as well as distribution of false discovery rate and P values (Figure 1B).

Disease status-associated 5hmC peaks were completely different from potential confounder-associated 5hmC peaks, except for smoking status (Supplementary Figure S11). Consistently, 5hmC levels of SD and PD samples were significantly different on smoking status-associated peaks, but not on sex-, age-, or race-associated peaks (Supplementary Figure S12A). Comparisons between either two of the three smoking statuses or between the two disease statuses shared 123, 282, 106, and 58 differential 5hmC peaks, respectively (Supplementary Figure S12B-C, Supplementary Tables S11-S14). The shared 4 groups of 5hmC peaks showed differences of 5hmC levels between PD and SD samples (Figure 1C), and can classify both disease statuses and smoking statuses (Figure 1D, Supplementary Figure S13D-E). Overall, although 5hmC levels varied based on patients’ characteristics, only smoking status affected disease status-associated 5hmC.

The hyper- or hypo-hydroxymethylated 5hmC peaks from PD versus SD samples (Supplementary Tables S15-S16) could not identify subtypes of sex, race, age, smoking status, or EGFR mutation (Figure 1E, Supplementary Figure S13A-C). They were correlated only with disease status, but not the potential confounders (Figure 1F, Supplementary Figure S13D-E). Functional enrichment analysis showed that the hyper-5hmC peaks were closely associated with lung development, vital capacity, and smoking (Supplementary Figure S14A-C). Interestingly, the hypo-5hmC peaks were not associated with lung function directly but may affect the disease status through the immune system, such as T cell activation, leukocyte adhesion, and lgM levels (Supplementary Figure S14D-E). Like the hyper-5hmC peaks, the hypo-5hmC peaks were also associated with smoking behaviors and forced expiratory volume (Supplementary Figure S14F).

The lung function- and immune system-associated 5hmC peaks were mainly located on gene bodies, but not intergenic (Figure 1G), such as a hyper-5hmC peak at the intron of thyroid hormone receptor beta (THRB) gene (Supplementary Figure S15A) which regulates lung development [5]. Some important lung function-associated genes were hyper-hydroxymethylated (Figure 1H, Supplementary Figure S15B), whereas hypo-5hmC peaks were located on the gene body of immune-associated genes (Supplementary Figure S15C, Supplementary Table S17). Regulatory elements and lung enhancers were enriched in the gene body, promoter, or intergenic regions of the hyper- or hypo-5hmC peaks (Figure 1I, Supplementary Figure S15D). Motifs and binding regions of some lung function-associated transcription factors (TFs) were also enriched in the hyper-5hmC peaks (Figure 1J, Supplementary Figure S15E). Taken together, disease status-dependent and patient characteristics-independent cfDNA 5hmC peaks can be linked to lung development, smoking behavior, and immune response, as well as lung function-associated enhancers and TF-binding sites (Supplementary Figure S16).

We optimized 888 peaks (Supplementary Table S18) from the differential 5hmC peaks to build a logistic regression model with an area under the receiver operating characteristic curve (AUC) of 0.998 using appropriate cutoffs of the output probabilities (Figure 1K, Supplementary Figure S17A-B). Based on the 888 peaks, unsupervised clustering could discriminate PD and SD samples with 100% accuracy, while not being able to discriminate different groups from sex, race, age, smoking status, or EGFR mutation subtypes (Figure 1L). The AUC of the model for predicting disease status was much greater than those for classifying age, sex, race, or smoking status (Figure 1M, Supplementary Figure S17C). Our cfDNA 5hmC-based logistic regression model could discriminate disease status accurately, sensitively, and specifically, and was independent of potential confounding factors in NSCLC (Figure 1M, Supplementary Figure S17D-E). The 888 peaks could not distinguish the 10 treatment-naïve samples and the 49 previously treated samples (Supplementary Figure S17F-G).

As expected, most of the 888 peaks were located at gene bodies, and multiple peaks might locate on the same gene (Figure 1N). Interestingly, three genes with three of the 888 peaks (Figure 1N) were strongly associated with lung function and lung cancer [6, 7], and highly expressed in various cancers including lung cancer (Supplementary Figure S18A). More importantly, lung cancer patients with different expression levels of the retinoic acid-induced 14 (RAI14 or NORPEG) gene demonstrated a 16% difference of survival probability (Supplementary Figure S18B). Furthermore, some of the 30 genes with t peaks (Figure 1N) were associated with lung function such as leptin receptor (LEPR) and f-box and leucine rich repeat protein 7 (FBXL7) [8, 9], survival probability of patients with lung cancer (Supplementary Figure S18C-D), and exhibited high expression level in lung cancer (Supplementary Figure S18E). To determine 5hmC biomarkers for disease status, the top 63 cfDNA 5hmC peaks (Supplementary Table S19) with maximum absolute values of logistic regression coefficients were selected from the 888 peaks. The 63-5hmC peak-based logistic model could also achieve high performance (Figure 1O, Supplementary Figure S19A-B). Some of the 63-peak-associated genes not only played an important role for lung function but also correlated with lung cancer survival probability (Supplementary Figure S19C-E).

Overall, we found that smoking status affected disease status-associated cfDNA 5hmC. We unveiled that lung function and regulatory elements were enriched in disease status-associated 5hmC peaks which could discriminate progressive and stable NSCLC with high sensitivity and specificity. Our results conferred the epigenomic distinguishability of different treatment responses and nominated cfDNA 5hmC profiling as a non-invasive, cost-effective, and universally applicable approach to monitor disease status.

C.M.B. reports research funding to the institution from AstraZeneca and BMS; advisory boards and personal consulting payments from Amgen, AstraZeneca, BMS, CVS, Daiichi Sankyo, EMD Serono, Gilead, Guardant, JNJ, Mirati, Novocure, Sanofi, Tempus and Turning Point Therapeutics. M.C.G. reports funding to the institution from Eli Lilly, MSD, Pfizer (MISP); AstraZeneca, MSD International GmbH, BMS, Boehringer Ingelheim Italia S.p.A, Celgene, Eli Lilly, Ignyta, Incyte, MedImmune, Novartis, Pfizer, Roche, Takeda, Tiziana, Foundation Medicine, Glaxo Smith Kline GSK, Spectrum pharmaceuticals. MCG reports advisory boards and personal consulting payments from AstraZeneca, MSD International GmbH, Bayer, BMS, Boehringer Ingelheim Italia S.p.A, Celgene, Eli Lilly, Incyte, Novartis, Pfizer, Roche, Takeda, Seattle Genetics, Mirati, Daiichi Sankyo, Regeneron, Merck, Blueprint, Jansenn, Sanofi, AbbVie, BeiGenius, Oncohost. The remaining authors report no competing interests.

No funding was received for this project.

This study was approved by the local Institutional Review Board according to the U.S. Common Rule ethical guidelines. All patients were consented to a general thoracic biobanking study under IRB 18-1319, which allowed for utilization of samples collected under IRB 9571.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cancer Communications
Cancer Communications Biochemistry, Genetics and Molecular Biology-Cancer Research
CiteScore
25.50
自引率
4.30%
发文量
153
审稿时长
4 weeks
期刊介绍: Cancer Communications is an open access, peer-reviewed online journal that encompasses basic, clinical, and translational cancer research. The journal welcomes submissions concerning clinical trials, epidemiology, molecular and cellular biology, and genetics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信