Benchmarking computational methods for multi-omics biomarker discovery in cancer.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2026-05-04 DOI:10.1093/bib/bbag200

Athan Z Li, Yuxuan Du, Yan Liu, Liang Chen, Ruishan Liu

{"title":"Benchmarking computational methods for multi-omics biomarker discovery in cancer.","authors":"Athan Z Li, Yuxuan Du, Yan Liu, Liang Chen, Ruishan Liu","doi":"10.1093/bib/bbag200","DOIUrl":null,"url":null,"abstract":"<p><p>Multi-omics profiling characterizes cancer biology and supports biomarker discovery for prognosis and therapy selection. Although numerous computational multi-omics biomarker identification methods have been proposed, their ability to identify clinically relevant biomarkers has not been systematically evaluated, leaving it unclear whether the resulting biomarker nominations are reliable for downstream validation. Here, we systematically benchmark 20 representative statistical, machine learning and deep learning methods using curated gold-standard prognostic and therapeutic biomarkers across five real-world datasets. We evaluate performance in terms of both biomarker identification accuracy and stability. Overall, DeePathNet and DeepKEGG achieve the best performance. Across methods, effective biomarker recovery is associated with the integration of biological knowledge, global feature interactions, multivariate feature attribution, and effective regularization. Analysis of omics type contributions reveals method- and modality-specific biases, highlighting the importance of broader omics integration. We further evaluate methods on simulated datasets to probe sensitivity with controlled signal and noise. By aggregating results from top-performing methods, we construct consensus biomarker panels that nominate candidates for potential investigations. Finally, we provide user-friendly interfaces to allow researchers to benchmark new methods against the 20 baselines or apply selected methods for biomarker identification on custom multi-omics datasets. Our benchmark is publicly available at https://github.com/athanzli/CancerMOBI-Bench.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 3","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2026-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13147463/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbag200","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-omics profiling characterizes cancer biology and supports biomarker discovery for prognosis and therapy selection. Although numerous computational multi-omics biomarker identification methods have been proposed, their ability to identify clinically relevant biomarkers has not been systematically evaluated, leaving it unclear whether the resulting biomarker nominations are reliable for downstream validation. Here, we systematically benchmark 20 representative statistical, machine learning and deep learning methods using curated gold-standard prognostic and therapeutic biomarkers across five real-world datasets. We evaluate performance in terms of both biomarker identification accuracy and stability. Overall, DeePathNet and DeepKEGG achieve the best performance. Across methods, effective biomarker recovery is associated with the integration of biological knowledge, global feature interactions, multivariate feature attribution, and effective regularization. Analysis of omics type contributions reveals method- and modality-specific biases, highlighting the importance of broader omics integration. We further evaluate methods on simulated datasets to probe sensitivity with controlled signal and noise. By aggregating results from top-performing methods, we construct consensus biomarker panels that nominate candidates for potential investigations. Finally, we provide user-friendly interfaces to allow researchers to benchmark new methods against the 20 baselines or apply selected methods for biomarker identification on custom multi-omics datasets. Our benchmark is publicly available at https://github.com/athanzli/CancerMOBI-Bench.

查看原文本刊更多论文

癌症中多组学生物标志物发现的基准计算方法。

多组学分析表征癌症生物学和支持生物标志物的发现预后和治疗选择。尽管已经提出了许多计算多组学生物标志物鉴定方法，但它们鉴定临床相关生物标志物的能力尚未得到系统评估，因此尚不清楚所产生的生物标志物提名是否可靠，可用于下游验证。在这里，我们系统地对20种代表性的统计、机器学习和深度学习方法进行基准测试，这些方法使用了五个真实世界数据集中精心设计的金标准预后和治疗生物标志物。我们从生物标志物鉴定的准确性和稳定性两方面来评估其性能。总的来说，DeePathNet和DeepKEGG达到了最好的性能。在各种方法中，有效的生物标志物恢复与生物学知识、全局特征相互作用、多元特征归因和有效正则化的整合有关。对组学类型贡献的分析揭示了方法和模式特定的偏差，强调了更广泛的组学整合的重要性。我们进一步在模拟数据集上评估了在控制信号和噪声的情况下探测灵敏度的方法。通过汇总来自顶级方法的结果，我们构建了共识生物标志物面板，以提名潜在研究的候选人。最后，我们提供了用户友好的界面，允许研究人员根据20个基线对新方法进行基准测试，或者在定制的多组学数据集上应用选定的生物标志物鉴定方法。我们的基准可以在https://github.com/athanzli/CancerMOBI-Bench上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.