Identification of key genes associated with survival of glioblastoma multiforme using integrated analysis of TCGA datasets

Computer methods and programs in biomedicine update Pub Date : 2022-01-01 DOI:10.1016/j.cmpbup.2022.100051

Seema Sandeep Redekar , Satishkumar L. Varma , Atanu Bhattacharjee

{"title":"Identification of key genes associated with survival of glioblastoma multiforme using integrated analysis of TCGA datasets","authors":"Seema Sandeep Redekar , Satishkumar L. Varma , Atanu Bhattacharjee","doi":"10.1016/j.cmpbup.2022.100051","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective</h3><p>Glioblastoma (GBM) is the most aggressive type of brain tumor. In spite of having various treatment options, GBM patients usually have a poor prognosis. Genetic markers play a vital role in the progression of the disease. Identification of these novel molecular biomarkers is essential to explain the mechanisms or improve the prognosis of GBM. Advances in high throughput genomic technologies enable the analysis of the varied types of omics data to find biomarkers in GBM. Although data repositories like The Cancer Genome Atlas (TCGA) are rich sources of such multi-omics data, integrating these different genomic datasets of varying quality and patient heterogeneity is challenging.</p></div><div><h3>Methods</h3><p>Multi-omics gene expression datasets from TCGA consisting of DNA methylation, RNA sequencing, and copy number variation (CNV) of GBM patient is obtained to carry out the analysis. The Cox proportional hazards regression model is developed in R to identify significant genes from diverse datasets associated with the patient's survival. (Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) is used as an estimator for the model. Validation is performed to determine the accuracy and corresponding prediction error.</p></div><div><h3>Results</h3><p>Five key genes are identified from DNA Methylation and RNA sequencing datasets those are ANK1, HOXA9, TOX2, CXCR6, PIGZ, and L3MBTL, KDM5B, CCDC138, NUS1P1, and ARHGAP42, respectively. Higher expression values of these genes determine better survival of the GBM patients. Kaplan-Meier estimate curves show the exact correlation. Lower values of AIC and BIC determine the suitability of the model. The prediction model is validated on the test set and signifies a low error rate. Copy number variation data is also analysed to find the significant chromosomal location of GBM patients associated with chromosome 2,5,6,7,12,13, respectively. Among all nine CNV locations are found to be influencing the progression of GBM.</p></div><div><h3>Conclusion</h3><p>Integrated analysis of multiple omics dataset is carried out to identify significant genes from DNA Methylation and RNA sequencing profiles of 76 common individuals. Copy number variation dataset for the same patients is analyzed to recognize notable locations associated with 22 chromosomes. The survival analysis determines the correlation of these biomarkers with the progression of the disease.</p></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":"2 ","pages":"Article 100051"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666990022000039/pdfft?md5=e56e6a85d26c6ce9044564a4722badb2&pid=1-s2.0-S2666990022000039-main.pdf","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990022000039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Background and Objective

Glioblastoma (GBM) is the most aggressive type of brain tumor. In spite of having various treatment options, GBM patients usually have a poor prognosis. Genetic markers play a vital role in the progression of the disease. Identification of these novel molecular biomarkers is essential to explain the mechanisms or improve the prognosis of GBM. Advances in high throughput genomic technologies enable the analysis of the varied types of omics data to find biomarkers in GBM. Although data repositories like The Cancer Genome Atlas (TCGA) are rich sources of such multi-omics data, integrating these different genomic datasets of varying quality and patient heterogeneity is challenging.

Methods

Multi-omics gene expression datasets from TCGA consisting of DNA methylation, RNA sequencing, and copy number variation (CNV) of GBM patient is obtained to carry out the analysis. The Cox proportional hazards regression model is developed in R to identify significant genes from diverse datasets associated with the patient's survival. (Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) is used as an estimator for the model. Validation is performed to determine the accuracy and corresponding prediction error.

Results

Five key genes are identified from DNA Methylation and RNA sequencing datasets those are ANK1, HOXA9, TOX2, CXCR6, PIGZ, and L3MBTL, KDM5B, CCDC138, NUS1P1, and ARHGAP42, respectively. Higher expression values of these genes determine better survival of the GBM patients. Kaplan-Meier estimate curves show the exact correlation. Lower values of AIC and BIC determine the suitability of the model. The prediction model is validated on the test set and signifies a low error rate. Copy number variation data is also analysed to find the significant chromosomal location of GBM patients associated with chromosome 2,5,6,7,12,13, respectively. Among all nine CNV locations are found to be influencing the progression of GBM.

Conclusion

Integrated analysis of multiple omics dataset is carried out to identify significant genes from DNA Methylation and RNA sequencing profiles of 76 common individuals. Copy number variation dataset for the same patients is analyzed to recognize notable locations associated with 22 chromosomes. The survival analysis determines the correlation of these biomarkers with the progression of the disease.

查看原文本刊更多论文

利用TCGA数据集的综合分析鉴定与多形性胶质母细胞瘤存活相关的关键基因

背景与目的胶质母细胞瘤(GBM)是最具侵袭性的脑肿瘤类型。尽管有多种治疗选择，但GBM患者通常预后较差。遗传标记在疾病的进展中起着至关重要的作用。鉴定这些新的分子生物标志物对于解释GBM的机制或改善预后至关重要。高通量基因组技术的进步使分析不同类型的组学数据能够在GBM中找到生物标志物。尽管像癌症基因组图谱(TCGA)这样的数据存储库是这种多组学数据的丰富来源，但整合这些不同质量和患者异质性的不同基因组数据集是具有挑战性的。方法从TCGA中获取GBM患者DNA甲基化、RNA测序和拷贝数变异(拷贝数变异)的多组学基因表达数据集进行分析。在R中开发了Cox比例风险回归模型，以从与患者生存相关的不同数据集中识别重要基因。采用赤池信息准则(AIC)和贝叶斯信息准则(BIC)作为模型的估计量。进行验证以确定准确度和相应的预测误差。结果从DNA甲基化和RNA测序数据中鉴定出5个关键基因，分别为ANK1、HOXA9、TOX2、CXCR6、PIGZ和L3MBTL、KDM5B、CCDC138、NUS1P1和ARHGAP42。这些基因的高表达值决定了GBM患者的生存率。Kaplan-Meier估计曲线显示了精确的相关性。AIC和BIC值越低，模型的适用性越大。该预测模型在测试集上得到了验证，错误率低。对拷贝数变异数据进行分析，发现GBM患者的显著染色体位置分别与染色体2、5、6、7、12、13相关。在所有9个CNV位点中发现影响GBM的进展。结论对76例常见个体的DNA甲基化和RNA测序图谱进行了多组学数据的综合分析，鉴定出了显著基因。对同一患者的拷贝数变异数据集进行分析，以识别与22条染色体相关的显著位置。生存分析确定了这些生物标志物与疾病进展的相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊