{"title":"TMBquant: an explainable AI-powered caller advancing tumor mutation burden quantification across heterogeneous samples.","authors":"Shenjie Wang, Xiaonan Wang, Xiaoyan Zhu, Xuwen Wang, Yuqian Liu, Minchao Zhao, Zhili Chang, Yang Shao, Haitao Zhang, Shuanying Yang, Jiayin Wang","doi":"10.1093/bib/bbaf455","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate tumor mutation burden (TMB) quantification is critical for immunotherapy stratification, yet remains challenging due to variability across sequencing platforms, tumor heterogeneity, and variant calling pipelines. Here, we introduce TMBquant, an explainable AI-powered caller designed to optimize TMB estimation through dynamic feature selection, ensemble learning, and automated strategy adaptation. Built upon the H2O AutoML framework, TMBquant integrates variant features, minimizes classification errors, and enhances both accuracy and stability across diverse datasets. We benchmarked TMBquant against nine widely used variant callers, including traditional tools (e.g. Mutect2, VarScan2, Strelka2) and recent AI-based methods (DeepSomatic, Octopus), using 706 whole-exome sequencing tumor-control pairs. To evaluate clinical relevance, we further assessed TMBquant through survival analyses across immunotherapy-treated cohorts of non-small cell lung cancer (NSCLC), nasopharyngeal carcinoma (NPC), and the two NSCLC subtypes: lung adenocarcinoma and lung squamous cell carcinoma. In each cohort, TMBquant consistently achieved the highest hazard ratios, demonstrating superior patient stratification compared to all other methods. Importantly, TMBquant maintained robust predictive performance across both high-TMB (NSCLC) and low-TMB (NPC) settings, highlighting its generalizability across cancer types with distinct biological characteristics. These findings establish TMBquant as a reliable, reproducible, and clinically actionable tool for precision oncology. The software is open source and freely available at https://github.com/SomaticCaller/SomaticCaller. To enhance reproducibility, we provide detailed usage instructions and representative code snippets for TMBquant in the Methods section (see Code Availability).</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12415849/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf455","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate tumor mutation burden (TMB) quantification is critical for immunotherapy stratification, yet remains challenging due to variability across sequencing platforms, tumor heterogeneity, and variant calling pipelines. Here, we introduce TMBquant, an explainable AI-powered caller designed to optimize TMB estimation through dynamic feature selection, ensemble learning, and automated strategy adaptation. Built upon the H2O AutoML framework, TMBquant integrates variant features, minimizes classification errors, and enhances both accuracy and stability across diverse datasets. We benchmarked TMBquant against nine widely used variant callers, including traditional tools (e.g. Mutect2, VarScan2, Strelka2) and recent AI-based methods (DeepSomatic, Octopus), using 706 whole-exome sequencing tumor-control pairs. To evaluate clinical relevance, we further assessed TMBquant through survival analyses across immunotherapy-treated cohorts of non-small cell lung cancer (NSCLC), nasopharyngeal carcinoma (NPC), and the two NSCLC subtypes: lung adenocarcinoma and lung squamous cell carcinoma. In each cohort, TMBquant consistently achieved the highest hazard ratios, demonstrating superior patient stratification compared to all other methods. Importantly, TMBquant maintained robust predictive performance across both high-TMB (NSCLC) and low-TMB (NPC) settings, highlighting its generalizability across cancer types with distinct biological characteristics. These findings establish TMBquant as a reliable, reproducible, and clinically actionable tool for precision oncology. The software is open source and freely available at https://github.com/SomaticCaller/SomaticCaller. To enhance reproducibility, we provide detailed usage instructions and representative code snippets for TMBquant in the Methods section (see Code Availability).
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.