{"title":"Machine learning-driven estimation of mutational burden highlights DNAH5 as a prognostic marker in colorectal cancer.","authors":"Yangyang Fang, Tianmei Fu, Qian Zhang, Ziqing Xiong, Kuai Yu, Aiping Le","doi":"10.1186/s13062-024-00564-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Tumor Mutational Burden (TMB) have emerged as pivotal predictive biomarkers in determining prognosis and response to immunotherapy in colorectal cancer (CRC) patients. While Whole Exome Sequencing (WES) stands as the gold standard for TMB assessment, carry substantial costs and demand considerable time commitments. Additionally, the heterogeneity among high-TMB patients remains poorly characterized.</p><p><strong>Methods: </strong>We employed eight advanced machine learning algorithms to develop gene-panel-based models for TMB estimation. To rigorously compare and validate these TMB estimation models, four external cohorts, involving 1,956 patients, were used. Furthermore, we computed the Pearson correlation coefficient between the estimated TMB and tumor neoantigen levels to elucidate their association. CD8<sup>+</sup> tumor-infiltrating lymphocyte (TIL) density was assessed via immunohistochemistry.</p><p><strong>Results: </strong>The TMB estimation model based on the Lasso algorithm, incorporating 20 genes, exhibiting satisfactory performance across multiple independent cohorts (R<sup>2</sup> ≥ 0.859). This 20-gene TMB model proved to be an independent prognostic indicator for the progression-free survival (PFS) of CRC patients (p = 0.001). DNAH5 mutations were associated with a more favorable prognosis in high-TMB CRC patients, and correlated strongly with tumor neoantigen levels and CD8<sup>+</sup> TIL density.</p><p><strong>Conclusions: </strong>The 20-gene model offers a cost-efficient approach to precisely estimating TMB, providing prognosis in patients with CRC. Incorporating DNAH5 within this model further refines the categorization of patients with elevated TMB. Utilizing the 20-gene model facilitates the stratification of patients with CRC, enabling more precise treatment planning.</p>","PeriodicalId":9164,"journal":{"name":"Biology Direct","volume":"19 1","pages":"116"},"PeriodicalIF":5.7000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566893/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology Direct","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13062-024-00564-0","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Tumor Mutational Burden (TMB) have emerged as pivotal predictive biomarkers in determining prognosis and response to immunotherapy in colorectal cancer (CRC) patients. While Whole Exome Sequencing (WES) stands as the gold standard for TMB assessment, carry substantial costs and demand considerable time commitments. Additionally, the heterogeneity among high-TMB patients remains poorly characterized.
Methods: We employed eight advanced machine learning algorithms to develop gene-panel-based models for TMB estimation. To rigorously compare and validate these TMB estimation models, four external cohorts, involving 1,956 patients, were used. Furthermore, we computed the Pearson correlation coefficient between the estimated TMB and tumor neoantigen levels to elucidate their association. CD8+ tumor-infiltrating lymphocyte (TIL) density was assessed via immunohistochemistry.
Results: The TMB estimation model based on the Lasso algorithm, incorporating 20 genes, exhibiting satisfactory performance across multiple independent cohorts (R2 ≥ 0.859). This 20-gene TMB model proved to be an independent prognostic indicator for the progression-free survival (PFS) of CRC patients (p = 0.001). DNAH5 mutations were associated with a more favorable prognosis in high-TMB CRC patients, and correlated strongly with tumor neoantigen levels and CD8+ TIL density.
Conclusions: The 20-gene model offers a cost-efficient approach to precisely estimating TMB, providing prognosis in patients with CRC. Incorporating DNAH5 within this model further refines the categorization of patients with elevated TMB. Utilizing the 20-gene model facilitates the stratification of patients with CRC, enabling more precise treatment planning.
期刊介绍:
Biology Direct serves the life science research community as an open access, peer-reviewed online journal, providing authors and readers with an alternative to the traditional model of peer review. Biology Direct considers original research articles, hypotheses, comments, discovery notes and reviews in subject areas currently identified as those most conducive to the open review approach, primarily those with a significant non-experimental component.