Cox-Sage：利用可解释图神经网络增强癌症预后的 Cox 比例危险模型。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-03-04 DOI:10.1093/bib/bbaf108

Ruijun Mao, Li Wan, Minghao Zhou, Dongxi Li

{"title":"Cox-Sage：利用可解释图神经网络增强癌症预后的 Cox 比例危险模型。","authors":"Ruijun Mao, Li Wan, Minghao Zhou, Dongxi Li","doi":"10.1093/bib/bbaf108","DOIUrl":null,"url":null,"abstract":"High-throughput sequencing technologies have facilitated a deeper exploration of prognostic biomarkers. While many deep learning (DL) methods primarily focus on feature extraction or employ simplistic fully connected layers within prognostic modules, the interpretability of DL-extracted features can be challenging. To address these challenges, we propose an interpretable cancer prognosis model called Cox-Sage. Specifically, we first propose an algorithm to construct a patient similarity graph from heterogeneous clinical data, and then extract protein-coding genes from the patient's gene expression data to embed them as features into the graph nodes. We utilize multilayer graph convolution to model proportional hazards pattern and introduce a mathematical method to clearly explain the meaning of our model's parameters. Based on this approach, we propose two metrics for measuring gene importance from different perspectives: mean hazard ratio and reciprocal of the mean hazard ratio. These metrics can be used to discover two types of important genes: genes whose low expression levels are associated with high cancer prognosis risk, and genes whose high expression levels are associated with high cancer prognosis risk. We conducted experiments on seven datasets from TCGA, and our model achieved superior prognostic performance compared with some state-of-the-art methods. As a primary research, we performed prognostic biomarker discovery on the LIHC (Liver Hepatocellular Carcinoma) dataset. Our code and dataset can be found at https://github.com/beeeginner/Cox-sage.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894944/pdf/","citationCount":"0","resultStr":"{\"title\":\"Cox-Sage: enhancing Cox proportional hazards model with interpretable graph neural networks for cancer prognosis.\",\"authors\":\"Ruijun Mao, Li Wan, Minghao Zhou, Dongxi Li\",\"doi\":\"10.1093/bib/bbaf108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-throughput sequencing technologies have facilitated a deeper exploration of prognostic biomarkers. While many deep learning (DL) methods primarily focus on feature extraction or employ simplistic fully connected layers within prognostic modules, the interpretability of DL-extracted features can be challenging. To address these challenges, we propose an interpretable cancer prognosis model called Cox-Sage. Specifically, we first propose an algorithm to construct a patient similarity graph from heterogeneous clinical data, and then extract protein-coding genes from the patient's gene expression data to embed them as features into the graph nodes. We utilize multilayer graph convolution to model proportional hazards pattern and introduce a mathematical method to clearly explain the meaning of our model's parameters. Based on this approach, we propose two metrics for measuring gene importance from different perspectives: mean hazard ratio and reciprocal of the mean hazard ratio. These metrics can be used to discover two types of important genes: genes whose low expression levels are associated with high cancer prognosis risk, and genes whose high expression levels are associated with high cancer prognosis risk. We conducted experiments on seven datasets from TCGA, and our model achieved superior prognostic performance compared with some state-of-the-art methods. As a primary research, we performed prognostic biomarker discovery on the LIHC (Liver Hepatocellular Carcinoma) dataset. Our code and dataset can be found at https://github.com/beeeginner/Cox-sage.\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 2\",\"pages\":\"\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894944/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf108\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf108","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

高通量测序技术促进了对预后生物标志物的更深入探索。虽然许多深度学习（DL）方法主要侧重于特征提取或在预测模块中使用简单的全连接层，但DL提取的特征的可解释性可能具有挑战性。为了应对这些挑战，我们提出了一个可解释的癌症预后模型Cox-Sage。具体而言，我们首先提出了一种从异构临床数据构建患者相似图的算法，然后从患者的基因表达数据中提取蛋白质编码基因，并将其作为特征嵌入到图节点中。我们利用多层图卷积来建模比例风险模式，并引入数学方法来清楚地解释模型参数的含义。基于这种方法，我们从不同的角度提出了两个衡量基因重要性的指标：平均风险比和平均风险比的倒数。这些指标可用于发现两类重要基因：低表达水平与高癌症预后风险相关的基因，以及高表达水平与高癌症预后风险相关的基因。我们在TCGA的7个数据集上进行了实验，与一些最先进的方法相比，我们的模型取得了更好的预测性能。作为初步研究，我们在LIHC（肝脏肝细胞癌）数据集上进行了预后生物标志物的发现。我们的代码和数据集可以在https://github.com/beeeginner/Cox-sage上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cox-Sage: enhancing Cox proportional hazards model with interpretable graph neural networks for cancer prognosis.

High-throughput sequencing technologies have facilitated a deeper exploration of prognostic biomarkers. While many deep learning (DL) methods primarily focus on feature extraction or employ simplistic fully connected layers within prognostic modules, the interpretability of DL-extracted features can be challenging. To address these challenges, we propose an interpretable cancer prognosis model called Cox-Sage. Specifically, we first propose an algorithm to construct a patient similarity graph from heterogeneous clinical data, and then extract protein-coding genes from the patient's gene expression data to embed them as features into the graph nodes. We utilize multilayer graph convolution to model proportional hazards pattern and introduce a mathematical method to clearly explain the meaning of our model's parameters. Based on this approach, we propose two metrics for measuring gene importance from different perspectives: mean hazard ratio and reciprocal of the mean hazard ratio. These metrics can be used to discover two types of important genes: genes whose low expression levels are associated with high cancer prognosis risk, and genes whose high expression levels are associated with high cancer prognosis risk. We conducted experiments on seven datasets from TCGA, and our model achieved superior prognostic performance compared with some state-of-the-art methods. As a primary research, we performed prognostic biomarker discovery on the LIHC (Liver Hepatocellular Carcinoma) dataset. Our code and dataset can be found at https://github.com/beeeginner/Cox-sage.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.