TMBquant: an explainable AI-powered caller advancing tumor mutation burden quantification across heterogeneous samples.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-08-31 DOI:10.1093/bib/bbaf455

Shenjie Wang, Xiaonan Wang, Xiaoyan Zhu, Xuwen Wang, Yuqian Liu, Minchao Zhao, Zhili Chang, Yang Shao, Haitao Zhang, Shuanying Yang, Jiayin Wang

{"title":"TMBquant: an explainable AI-powered caller advancing tumor mutation burden quantification across heterogeneous samples.","authors":"Shenjie Wang, Xiaonan Wang, Xiaoyan Zhu, Xuwen Wang, Yuqian Liu, Minchao Zhao, Zhili Chang, Yang Shao, Haitao Zhang, Shuanying Yang, Jiayin Wang","doi":"10.1093/bib/bbaf455","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate tumor mutation burden (TMB) quantification is critical for immunotherapy stratification, yet remains challenging due to variability across sequencing platforms, tumor heterogeneity, and variant calling pipelines. Here, we introduce TMBquant, an explainable AI-powered caller designed to optimize TMB estimation through dynamic feature selection, ensemble learning, and automated strategy adaptation. Built upon the H2O AutoML framework, TMBquant integrates variant features, minimizes classification errors, and enhances both accuracy and stability across diverse datasets. We benchmarked TMBquant against nine widely used variant callers, including traditional tools (e.g. Mutect2, VarScan2, Strelka2) and recent AI-based methods (DeepSomatic, Octopus), using 706 whole-exome sequencing tumor-control pairs. To evaluate clinical relevance, we further assessed TMBquant through survival analyses across immunotherapy-treated cohorts of non-small cell lung cancer (NSCLC), nasopharyngeal carcinoma (NPC), and the two NSCLC subtypes: lung adenocarcinoma and lung squamous cell carcinoma. In each cohort, TMBquant consistently achieved the highest hazard ratios, demonstrating superior patient stratification compared to all other methods. Importantly, TMBquant maintained robust predictive performance across both high-TMB (NSCLC) and low-TMB (NPC) settings, highlighting its generalizability across cancer types with distinct biological characteristics. These findings establish TMBquant as a reliable, reproducible, and clinically actionable tool for precision oncology. The software is open source and freely available at https://github.com/SomaticCaller/SomaticCaller. To enhance reproducibility, we provide detailed usage instructions and representative code snippets for TMBquant in the Methods section (see Code Availability).</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12415849/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf455","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate tumor mutation burden (TMB) quantification is critical for immunotherapy stratification, yet remains challenging due to variability across sequencing platforms, tumor heterogeneity, and variant calling pipelines. Here, we introduce TMBquant, an explainable AI-powered caller designed to optimize TMB estimation through dynamic feature selection, ensemble learning, and automated strategy adaptation. Built upon the H2O AutoML framework, TMBquant integrates variant features, minimizes classification errors, and enhances both accuracy and stability across diverse datasets. We benchmarked TMBquant against nine widely used variant callers, including traditional tools (e.g. Mutect2, VarScan2, Strelka2) and recent AI-based methods (DeepSomatic, Octopus), using 706 whole-exome sequencing tumor-control pairs. To evaluate clinical relevance, we further assessed TMBquant through survival analyses across immunotherapy-treated cohorts of non-small cell lung cancer (NSCLC), nasopharyngeal carcinoma (NPC), and the two NSCLC subtypes: lung adenocarcinoma and lung squamous cell carcinoma. In each cohort, TMBquant consistently achieved the highest hazard ratios, demonstrating superior patient stratification compared to all other methods. Importantly, TMBquant maintained robust predictive performance across both high-TMB (NSCLC) and low-TMB (NPC) settings, highlighting its generalizability across cancer types with distinct biological characteristics. These findings establish TMBquant as a reliable, reproducible, and clinically actionable tool for precision oncology. The software is open source and freely available at https://github.com/SomaticCaller/SomaticCaller. To enhance reproducibility, we provide detailed usage instructions and representative code snippets for TMBquant in the Methods section (see Code Availability).

Abstract Image

查看原文本刊更多论文

TMBquant：一个可解释的人工智能呼叫者，在异质样本中推进肿瘤突变负担量化。

准确的肿瘤突变负荷（TMB）量化对免疫治疗分层至关重要，但由于测序平台的差异、肿瘤的异质性和变体调用管道的差异，仍然具有挑战性。在这里，我们介绍TMBquant，一个可解释的人工智能调用者，旨在通过动态特征选择、集成学习和自动策略适应来优化TMB估计。TMBquant基于H2O AutoML框架，集成了各种特性，最大限度地减少了分类错误，并提高了跨不同数据集的准确性和稳定性。我们将TMBquant与九种广泛使用的变体调用者进行基准测试，包括传统工具（例如Mutect2， VarScan2, Strelka2）和最近基于人工智能的方法（DeepSomatic, Octopus），使用706个全外显子组测序肿瘤对照对。为了评估临床相关性，我们进一步通过免疫疗法治疗的非小细胞肺癌（NSCLC）、鼻咽癌（NPC）和两种NSCLC亚型：肺腺癌和肺鳞状细胞癌的生存分析来评估TMBquant。在每个队列中，TMBquant始终获得最高的风险比，与所有其他方法相比，显示出优越的患者分层。重要的是，TMBquant在高tmb （NSCLC）和低tmb （NPC）环境中都保持了强大的预测性能，突出了其在具有不同生物学特征的癌症类型中的通用性。这些发现使TMBquant成为一种可靠的、可重复的、临床可操作的精确肿瘤学工具。该软件是开源的，可以在https://github.com/SomaticCaller/SomaticCaller上免费获得。为了增强再现性，我们在方法部分为TMBquant提供了详细的使用说明和代表性代码片段（请参阅代码可用性）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.