HallmarkGraph: a cancer hallmark informed graph neural network for classifying hierarchical tumor subtypes.

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-09-01 DOI:10.1093/bioinformatics/btaf444

Qingsong Zhang, Fei Liu, Xin Lai

{"title":"HallmarkGraph: a cancer hallmark informed graph neural network for classifying hierarchical tumor subtypes.","authors":"Qingsong Zhang, Fei Liu, Xin Lai","doi":"10.1093/bioinformatics/btaf444","DOIUrl":null,"url":null,"abstract":"Motivation: Accurate tumor subtype diagnosis is crucial for precision oncology, yet current methodologies face significant challenges. These include balancing model accuracy with interpretability and the high costs of generating multi-omics data in clinical settings. Moreover, there is a lack of validated models capable of classifying hierarchical tumor subtypes across a comprehensive pan-cancer cohort.Results: We present a graph neural network, HallmarkGraph, the first biologically informed model developed to classify hierarchical tumor subtypes in human cancer. Inspired by cancer hallmarks, the model's architecture integrates transcriptome profiles and gene regulatory interactions to perform multi-label classification. We evaluate the model on a comprehensive pan-cancer cohort comprising 11 476 samples from 26 primary cancers with 405 subtypes up to eight levels. The model demonstrates exceptional performance, achieving 5-fold cross-validation accuracy between 85% and 99% for tumor subtypes labeled with increasing details of genomic information. It also shows good generalizability on a validation dataset of 887 samples, assessed using three metrics that consider tumor subtypes at individual, combined, and sample levels. Benchmarking and ablation experiments show that hallmark-based embeddings slightly influence model performance, while the integrated multilayer perceptron plays a significant role in determining classifier accuracy. Additionally, we use the SHAP method to link cancer hallmarks with genes, identifying key features that influence model decisions. Our findings present a biologically informed machine learning framework capable of tracking tumor transcriptomic trajectories and distinguishing inter- and intra-tumor heterogeneity in pan-cancer. This approach holds promise for enhancing cancer diagnostics.Availability and implementation: HallmarkGraph is accessible at https://github.com/laixn/HallmarkGraph.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12401579/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: Accurate tumor subtype diagnosis is crucial for precision oncology, yet current methodologies face significant challenges. These include balancing model accuracy with interpretability and the high costs of generating multi-omics data in clinical settings. Moreover, there is a lack of validated models capable of classifying hierarchical tumor subtypes across a comprehensive pan-cancer cohort.

Results: We present a graph neural network, HallmarkGraph, the first biologically informed model developed to classify hierarchical tumor subtypes in human cancer. Inspired by cancer hallmarks, the model's architecture integrates transcriptome profiles and gene regulatory interactions to perform multi-label classification. We evaluate the model on a comprehensive pan-cancer cohort comprising 11 476 samples from 26 primary cancers with 405 subtypes up to eight levels. The model demonstrates exceptional performance, achieving 5-fold cross-validation accuracy between 85% and 99% for tumor subtypes labeled with increasing details of genomic information. It also shows good generalizability on a validation dataset of 887 samples, assessed using three metrics that consider tumor subtypes at individual, combined, and sample levels. Benchmarking and ablation experiments show that hallmark-based embeddings slightly influence model performance, while the integrated multilayer perceptron plays a significant role in determining classifier accuracy. Additionally, we use the SHAP method to link cancer hallmarks with genes, identifying key features that influence model decisions. Our findings present a biologically informed machine learning framework capable of tracking tumor transcriptomic trajectories and distinguishing inter- and intra-tumor heterogeneity in pan-cancer. This approach holds promise for enhancing cancer diagnostics.

Availability and implementation: HallmarkGraph is accessible at https://github.com/laixn/HallmarkGraph.

查看原文本刊更多论文

HallmarkGraph：一个用于分类分层肿瘤亚型的癌症标志信息图神经网络。

动机：准确的肿瘤亚型诊断对精确肿瘤学至关重要，但目前的方法面临重大挑战。这些问题包括平衡模型准确性与可解释性，以及在临床环境中生成多组学数据的高成本。此外，缺乏经过验证的模型，能够在全面的泛癌症队列中对分层肿瘤亚型进行分类。结果：我们提出了一个图神经网络，HallmarkGraph，这是第一个生物信息模型，用于对人类癌症的分层肿瘤亚型进行分类。受癌症特征的启发，该模型的架构集成了转录组谱和基因调控相互作用，以执行多标签分类。我们在一个综合的泛癌症队列中对该模型进行了评估，该队列包括来自26种原发癌症、405种亚型的11,476个样本。该模型表现出卓越的性能，对于标记有越来越多基因组信息细节的肿瘤亚型，实现了85%至99%的5倍交叉验证准确率。它还在887个样本的验证数据集上显示出良好的通用性，使用考虑个体、组合和样本水平的肿瘤亚型的三个指标进行评估。基准测试和消融实验表明，基于特征的嵌入对模型性能影响较小，而集成多层感知器在确定分类器精度方面起着重要作用。此外，我们采用SHAP方法将癌症特征与基因联系起来，确定影响模型决策的关键特征。我们的研究结果提出了一个生物学信息的机器学习框架，能够跟踪肿瘤转录组轨迹并区分泛癌症中肿瘤间和肿瘤内的异质性。这种方法有望加强癌症诊断。可用性：HallmarkGraph可在https://github.com/laixn/HallmarkGraph.Supplementary信息上访问；补充数据可在Bioinformatics在线上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量