TMBquant: an explainable AI-powered caller advancing tumor mutation burden quantification across heterogeneous samples.

IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Shenjie Wang, Xiaonan Wang, Xiaoyan Zhu, Xuwen Wang, Yuqian Liu, Minchao Zhao, Zhili Chang, Yang Shao, Haitao Zhang, Shuanying Yang, Jiayin Wang
{"title":"TMBquant: an explainable AI-powered caller advancing tumor mutation burden quantification across heterogeneous samples.","authors":"Shenjie Wang, Xiaonan Wang, Xiaoyan Zhu, Xuwen Wang, Yuqian Liu, Minchao Zhao, Zhili Chang, Yang Shao, Haitao Zhang, Shuanying Yang, Jiayin Wang","doi":"10.1093/bib/bbaf455","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate tumor mutation burden (TMB) quantification is critical for immunotherapy stratification, yet remains challenging due to variability across sequencing platforms, tumor heterogeneity, and variant calling pipelines. Here, we introduce TMBquant, an explainable AI-powered caller designed to optimize TMB estimation through dynamic feature selection, ensemble learning, and automated strategy adaptation. Built upon the H2O AutoML framework, TMBquant integrates variant features, minimizes classification errors, and enhances both accuracy and stability across diverse datasets. We benchmarked TMBquant against nine widely used variant callers, including traditional tools (e.g. Mutect2, VarScan2, Strelka2) and recent AI-based methods (DeepSomatic, Octopus), using 706 whole-exome sequencing tumor-control pairs. To evaluate clinical relevance, we further assessed TMBquant through survival analyses across immunotherapy-treated cohorts of non-small cell lung cancer (NSCLC), nasopharyngeal carcinoma (NPC), and the two NSCLC subtypes: lung adenocarcinoma and lung squamous cell carcinoma. In each cohort, TMBquant consistently achieved the highest hazard ratios, demonstrating superior patient stratification compared to all other methods. Importantly, TMBquant maintained robust predictive performance across both high-TMB (NSCLC) and low-TMB (NPC) settings, highlighting its generalizability across cancer types with distinct biological characteristics. These findings establish TMBquant as a reliable, reproducible, and clinically actionable tool for precision oncology. The software is open source and freely available at https://github.com/SomaticCaller/SomaticCaller. To enhance reproducibility, we provide detailed usage instructions and representative code snippets for TMBquant in the Methods section (see Code Availability).</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12415849/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf455","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate tumor mutation burden (TMB) quantification is critical for immunotherapy stratification, yet remains challenging due to variability across sequencing platforms, tumor heterogeneity, and variant calling pipelines. Here, we introduce TMBquant, an explainable AI-powered caller designed to optimize TMB estimation through dynamic feature selection, ensemble learning, and automated strategy adaptation. Built upon the H2O AutoML framework, TMBquant integrates variant features, minimizes classification errors, and enhances both accuracy and stability across diverse datasets. We benchmarked TMBquant against nine widely used variant callers, including traditional tools (e.g. Mutect2, VarScan2, Strelka2) and recent AI-based methods (DeepSomatic, Octopus), using 706 whole-exome sequencing tumor-control pairs. To evaluate clinical relevance, we further assessed TMBquant through survival analyses across immunotherapy-treated cohorts of non-small cell lung cancer (NSCLC), nasopharyngeal carcinoma (NPC), and the two NSCLC subtypes: lung adenocarcinoma and lung squamous cell carcinoma. In each cohort, TMBquant consistently achieved the highest hazard ratios, demonstrating superior patient stratification compared to all other methods. Importantly, TMBquant maintained robust predictive performance across both high-TMB (NSCLC) and low-TMB (NPC) settings, highlighting its generalizability across cancer types with distinct biological characteristics. These findings establish TMBquant as a reliable, reproducible, and clinically actionable tool for precision oncology. The software is open source and freely available at https://github.com/SomaticCaller/SomaticCaller. To enhance reproducibility, we provide detailed usage instructions and representative code snippets for TMBquant in the Methods section (see Code Availability).

Abstract Image

Abstract Image

Abstract Image

TMBquant:一个可解释的人工智能呼叫者,在异质样本中推进肿瘤突变负担量化。
准确的肿瘤突变负荷(TMB)量化对免疫治疗分层至关重要,但由于测序平台的差异、肿瘤的异质性和变体调用管道的差异,仍然具有挑战性。在这里,我们介绍TMBquant,一个可解释的人工智能调用者,旨在通过动态特征选择、集成学习和自动策略适应来优化TMB估计。TMBquant基于H2O AutoML框架,集成了各种特性,最大限度地减少了分类错误,并提高了跨不同数据集的准确性和稳定性。我们将TMBquant与九种广泛使用的变体调用者进行基准测试,包括传统工具(例如Mutect2, VarScan2, Strelka2)和最近基于人工智能的方法(DeepSomatic, Octopus),使用706个全外显子组测序肿瘤对照对。为了评估临床相关性,我们进一步通过免疫疗法治疗的非小细胞肺癌(NSCLC)、鼻咽癌(NPC)和两种NSCLC亚型:肺腺癌和肺鳞状细胞癌的生存分析来评估TMBquant。在每个队列中,TMBquant始终获得最高的风险比,与所有其他方法相比,显示出优越的患者分层。重要的是,TMBquant在高tmb (NSCLC)和低tmb (NPC)环境中都保持了强大的预测性能,突出了其在具有不同生物学特征的癌症类型中的通用性。这些发现使TMBquant成为一种可靠的、可重复的、临床可操作的精确肿瘤学工具。该软件是开源的,可以在https://github.com/SomaticCaller/SomaticCaller上免费获得。为了增强再现性,我们在方法部分为TMBquant提供了详细的使用说明和代表性代码片段(请参阅代码可用性)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信