{"title":"TCGA Lung Cancer Analysis Pipeline","authors":"T. Zengin, Tugba Onal Suzek","doi":"10.1145/3233547.3233615","DOIUrl":null,"url":null,"abstract":"Cancer cells contain thousands of mutated genes, differential copy numbers and differential expressions of genes. The progression of cancer differs from patient to patient. Identification of key proteins and pathways of individual patient's molecular profile has become important for personalized medicine. At the first step of our proposed pipeline, gene mutations, gene expression profile, copy number variations and clinical data of lung cancer patients (LUAD) are downloaded from TCGA. Significant genomic variations are determined by using R MADGIC and GAIA packages. Using R Deseq2 package, most active differentially expressed genes are determined for the patients (number of patients=55) for whom the adjacent normal tissue RNA-seq expression levels are available. Most active pathways are determined by Cytoscape jactivemodules program based on expression levels. For significant genomic variations and gene expression levels, MDS plot and Kaplan-Meier survival analysis of the patients is performed. The most mutated genes in 565 LUAD samples were identified by TCGA-Biolinks package. We found that TP53, a known tumor suppressor gene, has a mutation in 48% of the patients. Survival analysis for the 55 LUAD patients clustered using K-means clustering (k=2) was performed. Results show that survival probability of two clusters doesn't vary significantly. The goals of this study are to 1) computationally identify the most significant genes whose mutation and expression profile correlate with the patient survival time 2) verify the significance of results against the results of an earlier study conducted on TCGA LUAD dataset [1] and 3) provide an open-source automated pipeline.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cancer cells contain thousands of mutated genes, differential copy numbers and differential expressions of genes. The progression of cancer differs from patient to patient. Identification of key proteins and pathways of individual patient's molecular profile has become important for personalized medicine. At the first step of our proposed pipeline, gene mutations, gene expression profile, copy number variations and clinical data of lung cancer patients (LUAD) are downloaded from TCGA. Significant genomic variations are determined by using R MADGIC and GAIA packages. Using R Deseq2 package, most active differentially expressed genes are determined for the patients (number of patients=55) for whom the adjacent normal tissue RNA-seq expression levels are available. Most active pathways are determined by Cytoscape jactivemodules program based on expression levels. For significant genomic variations and gene expression levels, MDS plot and Kaplan-Meier survival analysis of the patients is performed. The most mutated genes in 565 LUAD samples were identified by TCGA-Biolinks package. We found that TP53, a known tumor suppressor gene, has a mutation in 48% of the patients. Survival analysis for the 55 LUAD patients clustered using K-means clustering (k=2) was performed. Results show that survival probability of two clusters doesn't vary significantly. The goals of this study are to 1) computationally identify the most significant genes whose mutation and expression profile correlate with the patient survival time 2) verify the significance of results against the results of an earlier study conducted on TCGA LUAD dataset [1] and 3) provide an open-source automated pipeline.