TCGA Lung Cancer Analysis Pipeline

T. Zengin, Tugba Onal Suzek
{"title":"TCGA Lung Cancer Analysis Pipeline","authors":"T. Zengin, Tugba Onal Suzek","doi":"10.1145/3233547.3233615","DOIUrl":null,"url":null,"abstract":"Cancer cells contain thousands of mutated genes, differential copy numbers and differential expressions of genes. The progression of cancer differs from patient to patient. Identification of key proteins and pathways of individual patient's molecular profile has become important for personalized medicine. At the first step of our proposed pipeline, gene mutations, gene expression profile, copy number variations and clinical data of lung cancer patients (LUAD) are downloaded from TCGA. Significant genomic variations are determined by using R MADGIC and GAIA packages. Using R Deseq2 package, most active differentially expressed genes are determined for the patients (number of patients=55) for whom the adjacent normal tissue RNA-seq expression levels are available. Most active pathways are determined by Cytoscape jactivemodules program based on expression levels. For significant genomic variations and gene expression levels, MDS plot and Kaplan-Meier survival analysis of the patients is performed. The most mutated genes in 565 LUAD samples were identified by TCGA-Biolinks package. We found that TP53, a known tumor suppressor gene, has a mutation in 48% of the patients. Survival analysis for the 55 LUAD patients clustered using K-means clustering (k=2) was performed. Results show that survival probability of two clusters doesn't vary significantly. The goals of this study are to 1) computationally identify the most significant genes whose mutation and expression profile correlate with the patient survival time 2) verify the significance of results against the results of an earlier study conducted on TCGA LUAD dataset [1] and 3) provide an open-source automated pipeline.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cancer cells contain thousands of mutated genes, differential copy numbers and differential expressions of genes. The progression of cancer differs from patient to patient. Identification of key proteins and pathways of individual patient's molecular profile has become important for personalized medicine. At the first step of our proposed pipeline, gene mutations, gene expression profile, copy number variations and clinical data of lung cancer patients (LUAD) are downloaded from TCGA. Significant genomic variations are determined by using R MADGIC and GAIA packages. Using R Deseq2 package, most active differentially expressed genes are determined for the patients (number of patients=55) for whom the adjacent normal tissue RNA-seq expression levels are available. Most active pathways are determined by Cytoscape jactivemodules program based on expression levels. For significant genomic variations and gene expression levels, MDS plot and Kaplan-Meier survival analysis of the patients is performed. The most mutated genes in 565 LUAD samples were identified by TCGA-Biolinks package. We found that TP53, a known tumor suppressor gene, has a mutation in 48% of the patients. Survival analysis for the 55 LUAD patients clustered using K-means clustering (k=2) was performed. Results show that survival probability of two clusters doesn't vary significantly. The goals of this study are to 1) computationally identify the most significant genes whose mutation and expression profile correlate with the patient survival time 2) verify the significance of results against the results of an earlier study conducted on TCGA LUAD dataset [1] and 3) provide an open-source automated pipeline.
TCGA肺癌分析管道
癌细胞包含数以千计的突变基因、不同的拷贝数和不同的基因表达。癌症的进展因病人而异。个体患者分子图谱的关键蛋白和通路的识别已成为个性化医疗的重要内容。在我们提出的管道的第一步,从TCGA下载肺癌患者(LUAD)的基因突变、基因表达谱、拷贝数变异和临床数据。使用R MADGIC和GAIA包确定显著的基因组变异。使用R Deseq2软件包,对可获得邻近正常组织RNA-seq表达水平的患者(患者数=55)进行了大多数活性差异表达基因的测定。大多数活性途径是由基于表达水平的Cytoscape jactivmodules程序决定的。对于显著的基因组变异和基因表达水平,对患者进行MDS图和Kaplan-Meier生存分析。通过tcga - biollinks软件包对565份LUAD样本中突变最多的基因进行鉴定。我们发现TP53,一种已知的肿瘤抑制基因,在48%的患者中发生了突变。采用k -均值聚类(k=2)对55例LUAD患者进行生存分析。结果表明,两组患者的生存率差异不显著。本研究的目标是:1)通过计算确定突变和表达谱与患者生存时间相关的最重要基因;2)与早期在TCGA LUAD数据集[1]上进行的研究结果验证结果的重要性;3)提供一个开源的自动化管道。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信