Beibei Hu, Guohui Yin, Jialin Zhu, Yi Bai, Xuren Sun
{"title":"Continuous prediction for tumor mutation burden based on transcriptional data in gastrointestinal cancers.","authors":"Beibei Hu, Guohui Yin, Jialin Zhu, Yi Bai, Xuren Sun","doi":"10.1186/s12911-024-02794-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Tumor mutation burden (TMB) has been considered a biomarker for utilization of immune checkpoint inhibitors(ICIs), but whole exome sequencing(WES) and cancer gene panel(CGP) based on next generation sequencing for TMB detection are costly. Here, we use transcriptome data of TCGA to construct a model for TMB prediction in gastrointestinal tumors.</p><p><strong>Methods: </strong>Transcriptome data, somatic mutation data and clinical data of four gastrointestinal tumors from TCGA, including esophageal cancer (ESCA), stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ). Using R, we performed visual analysis of somatic mutation data, differentially expressed genes (DEGs) function enrichment analysis, gene set enrichment analysis (GSEA), and estimated TMB value in clinic. Finally, a deep neural network (DNN) model was constructed for TMB prediction.</p><p><strong>Results: </strong>Visualization of somatic mutation data summarized the classification of mutation, frequency of each mutation type, and top-mutated genes. GSEA showed the enrichment of CD4<sup>+</sup>/CD8<sup>+</sup> T cells in the high TMB group and the activation of tumor suppressing pathways. Single-sample GSEA (ssGSEA) manifested that the high-TMB group had higher level of multiple immune cells infiltration. In addition, distribution of TMB was related to clinical parameters. Like age, M stage, N stage, AJCC stage, and overall survival(OS). After model optimization using genetic algorithm, in the training set, validation set, and testing set, the Pearson relevance coefficient r between predicted values and actual values reaches 0.98, 0.82, and 0.92, respectively; the coefficient of determination R2 is 0.95, 0.82, and 0.7, respectively.</p><p><strong>Conclusion: </strong>TMB correlates with clinicopathological parameters in gastrointestinal carcinoma, and patients with high TMB have higher levels of immune infiltration. In addition, the DNN model based on 31 genes predicts TMB of gastrointestinal tumors in a high accuracy.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"24 1","pages":"384"},"PeriodicalIF":3.3000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11654420/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02794-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Tumor mutation burden (TMB) has been considered a biomarker for utilization of immune checkpoint inhibitors(ICIs), but whole exome sequencing(WES) and cancer gene panel(CGP) based on next generation sequencing for TMB detection are costly. Here, we use transcriptome data of TCGA to construct a model for TMB prediction in gastrointestinal tumors.
Methods: Transcriptome data, somatic mutation data and clinical data of four gastrointestinal tumors from TCGA, including esophageal cancer (ESCA), stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD) and rectal adenocarcinoma (READ). Using R, we performed visual analysis of somatic mutation data, differentially expressed genes (DEGs) function enrichment analysis, gene set enrichment analysis (GSEA), and estimated TMB value in clinic. Finally, a deep neural network (DNN) model was constructed for TMB prediction.
Results: Visualization of somatic mutation data summarized the classification of mutation, frequency of each mutation type, and top-mutated genes. GSEA showed the enrichment of CD4+/CD8+ T cells in the high TMB group and the activation of tumor suppressing pathways. Single-sample GSEA (ssGSEA) manifested that the high-TMB group had higher level of multiple immune cells infiltration. In addition, distribution of TMB was related to clinical parameters. Like age, M stage, N stage, AJCC stage, and overall survival(OS). After model optimization using genetic algorithm, in the training set, validation set, and testing set, the Pearson relevance coefficient r between predicted values and actual values reaches 0.98, 0.82, and 0.92, respectively; the coefficient of determination R2 is 0.95, 0.82, and 0.7, respectively.
Conclusion: TMB correlates with clinicopathological parameters in gastrointestinal carcinoma, and patients with high TMB have higher levels of immune infiltration. In addition, the DNN model based on 31 genes predicts TMB of gastrointestinal tumors in a high accuracy.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.