GNNMutation：一个基于异构图的癌症检测框架。

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-06-04 DOI:10.1186/s12859-025-06133-0

Nuriye Özlem Özcan Şimşek, Arzucan Özgür, Fikret Gürgen

{"title":"GNNMutation：一个基于异构图的癌症检测框架。","authors":"Nuriye Özlem Özcan Şimşek, Arzucan Özgür, Fikret Gürgen","doi":"10.1186/s12859-025-06133-0","DOIUrl":null,"url":null,"abstract":"Background: When genes are translated into proteins, mutations in the gene sequence can lead to changes in protein structure and function as well as in the interactions between proteins. These changes can disrupt cell function and contribute to the development of tumors. In this study, we introduce a novel approach based on graph neural networks that jointly considers genetic mutations and protein interactions for cancer prediction. We use DNA mutations in whole exome sequencing data and construct a heterogeneous graph in which patients and proteins are represented as nodes and protein-protein interactions as edges. Furthermore, patient nodes are connected to protein nodes based on mutations in the patient's DNA. Each patient node is represented by a feature vector derived from the mutations in specific genes. The feature values are calculated using a weighting scheme inspired by information retrieval, where whole genomes are treated as documents and mutations as words within these documents. The weighting of each gene, determined by its mutations, reflects its contribution to disease development. The patient nodes are updated by both mutations and protein interactions within our noval heterogeneous graph structure. Since the effects of each mutation on disease development are different, we processed the input graph with attention-based graph neural networks.Results: We compiled a dataset from the UKBiobank consisting of patients with a cancer diagnosis as the case group and those without a cancer diagnosis as the control group. We evaluated our approach for the four most common cancer types, which are breast, prostate, lung and colon cancer, and showed that the proposed framework effectively discriminates between case and control groups.Conclusions: The results indicate that our proposed graph structure and node updating strategy improve cancer classification performance. Additionally, we extended our system with an explainer that identifies a list of causal genes which are effective in the model's cancer diagnosis predictions. Notably, some of these genes have already been studied in cancer research, demonstrating the system's ability to recognize causal genes for the selected cancer types and make predictions based on them.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"153"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12139269/pdf/","citationCount":"0","resultStr":"{\"title\":\"GNNMutation: a heterogeneous graph-based framework for cancer detection.\",\"authors\":\"Nuriye Özlem Özcan Şimşek, Arzucan Özgür, Fikret Gürgen\",\"doi\":\"10.1186/s12859-025-06133-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: When genes are translated into proteins, mutations in the gene sequence can lead to changes in protein structure and function as well as in the interactions between proteins. These changes can disrupt cell function and contribute to the development of tumors. In this study, we introduce a novel approach based on graph neural networks that jointly considers genetic mutations and protein interactions for cancer prediction. We use DNA mutations in whole exome sequencing data and construct a heterogeneous graph in which patients and proteins are represented as nodes and protein-protein interactions as edges. Furthermore, patient nodes are connected to protein nodes based on mutations in the patient's DNA. Each patient node is represented by a feature vector derived from the mutations in specific genes. The feature values are calculated using a weighting scheme inspired by information retrieval, where whole genomes are treated as documents and mutations as words within these documents. The weighting of each gene, determined by its mutations, reflects its contribution to disease development. The patient nodes are updated by both mutations and protein interactions within our noval heterogeneous graph structure. Since the effects of each mutation on disease development are different, we processed the input graph with attention-based graph neural networks.Results: We compiled a dataset from the UKBiobank consisting of patients with a cancer diagnosis as the case group and those without a cancer diagnosis as the control group. We evaluated our approach for the four most common cancer types, which are breast, prostate, lung and colon cancer, and showed that the proposed framework effectively discriminates between case and control groups.Conclusions: The results indicate that our proposed graph structure and node updating strategy improve cancer classification performance. Additionally, we extended our system with an explainer that identifies a list of causal genes which are effective in the model's cancer diagnosis predictions. Notably, some of these genes have already been studied in cancer research, demonstrating the system's ability to recognize causal genes for the selected cancer types and make predictions based on them.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"153\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12139269/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06133-0\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06133-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：当基因被翻译成蛋白质时，基因序列的突变会导致蛋白质结构和功能的改变以及蛋白质间相互作用的改变。这些变化会破坏细胞功能，促进肿瘤的发展。在这项研究中，我们引入了一种基于图神经网络的新方法，该方法联合考虑了基因突变和蛋白质相互作用，用于癌症预测。我们使用全外显子组测序数据中的DNA突变，并构建了一个异质图，其中患者和蛋白质被表示为节点，蛋白质-蛋白质相互作用被表示为边缘。此外，患者节点与基于患者DNA突变的蛋白质节点相连。每个患者节点由特定基因突变衍生的特征向量表示。特征值的计算使用受信息检索启发的加权方案，其中将整个基因组视为文档，将突变视为这些文档中的单词。每个基因的权重由其突变决定，反映了其对疾病发展的贡献。在我们新的异质图结构中，患者节点由突变和蛋白质相互作用更新。由于每种突变对疾病发展的影响是不同的，我们使用基于注意力的图神经网络处理输入图。结果：我们编译了一个来自UKBiobank的数据集，其中包括癌症诊断的患者作为病例组，而没有癌症诊断的患者作为对照组。我们针对四种最常见的癌症类型（乳腺癌、前列腺癌、肺癌和结肠癌）评估了我们的方法，并表明所提出的框架有效地区分了病例组和对照组。结论：本文提出的图结构和节点更新策略提高了癌症分类性能。此外，我们用一个解释器扩展了我们的系统，该解释器可以识别在模型癌症诊断预测中有效的因果基因列表。值得注意的是，其中一些基因已经在癌症研究中得到了研究，这表明该系统能够识别选定癌症类型的致病基因，并据此做出预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

GNNMutation: a heterogeneous graph-based framework for cancer detection.

查看原文本刊更多论文

GNNMutation: a heterogeneous graph-based framework for cancer detection.

Background: When genes are translated into proteins, mutations in the gene sequence can lead to changes in protein structure and function as well as in the interactions between proteins. These changes can disrupt cell function and contribute to the development of tumors. In this study, we introduce a novel approach based on graph neural networks that jointly considers genetic mutations and protein interactions for cancer prediction. We use DNA mutations in whole exome sequencing data and construct a heterogeneous graph in which patients and proteins are represented as nodes and protein-protein interactions as edges. Furthermore, patient nodes are connected to protein nodes based on mutations in the patient's DNA. Each patient node is represented by a feature vector derived from the mutations in specific genes. The feature values are calculated using a weighting scheme inspired by information retrieval, where whole genomes are treated as documents and mutations as words within these documents. The weighting of each gene, determined by its mutations, reflects its contribution to disease development. The patient nodes are updated by both mutations and protein interactions within our noval heterogeneous graph structure. Since the effects of each mutation on disease development are different, we processed the input graph with attention-based graph neural networks.

Results: We compiled a dataset from the UKBiobank consisting of patients with a cancer diagnosis as the case group and those without a cancer diagnosis as the control group. We evaluated our approach for the four most common cancer types, which are breast, prostate, lung and colon cancer, and showed that the proposed framework effectively discriminates between case and control groups.

Conclusions: The results indicate that our proposed graph structure and node updating strategy improve cancer classification performance. Additionally, we extended our system with an explainer that identifies a list of causal genes which are effective in the model's cancer diagnosis predictions. Notably, some of these genes have already been studied in cancer research, demonstrating the system's ability to recognize causal genes for the selected cancer types and make predictions based on them.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.