A graph neural network approach for hierarchical mapping of breast cancer protein communities.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-01-21 DOI:10.1186/s12859-024-06015-x

Xiao Zhang, Qian Liu

{"title":"A graph neural network approach for hierarchical mapping of breast cancer protein communities.","authors":"Xiao Zhang, Qian Liu","doi":"10.1186/s12859-024-06015-x","DOIUrl":null,"url":null,"abstract":"Background: Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering.Results: Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine.Conclusion: The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"23"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11749236/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-06015-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Comprehensively mapping the hierarchical structure of breast cancer protein communities and identifying potential biomarkers from them is a promising way for breast cancer research. Existing approaches are subjective and fail to take information from protein sequences into consideration. Deep learning can automatically learn features from protein sequences and protein-protein interactions for hierarchical clustering.

Results: Using a large amount of publicly available proteomics data, we created a hierarchical tree for breast cancer protein communities using a novel hierarchical graph neural network, with the supervision of gene ontology terms and assistance of a pre-trained deep contextual language model. Then, a group-lasso algorithm was applied to identify protein communities that are under both mutation burden and survival burden, undergo significant alterations when targeted by specific drug molecules, and show cancer-dependent perturbations. The resulting hierarchical map of protein communities shows how gene-level mutations and survival information converge on protein communities at different scales. Internal validity of the model was established through the convergence on BRCA2 as a breast cancer hotspot. Further overlaps with breast cancer cell dependencies revealed SUPT6H and RAD21, along with their respective protein systems, HOST:37 and HOST:861, as potential biomarkers. Using gene-level perturbation data of the HOST:37 and HOST:861 gene sets, three FDA-approved drugs with high therapeutic value were selected as potential treatments to be further evaluated. These drugs include mercaptopurine, pioglitazone, and colchicine.

Conclusion: The proposed graph neural network approach to analyzing breast cancer protein communities in a hierarchical structure provides a novel perspective on breast cancer prognosis and treatment. By targeting entire gene sets, we were able to evaluate the prognostic and therapeutic value of genes (or gene sets) at different levels, from gene-level to system-level biology. Cancer-specific gene dependencies provide additional context for pinpointing cancer-related systems and drug-induced alterations can highlight potential therapeutic targets. These identified protein communities, in conjunction with other protein communities under strong mutation and survival burdens, can potentially be used as clinical biomarkers for breast cancer.

查看原文本刊更多论文

基于图神经网络的乳腺癌蛋白群体分层映射方法。

背景：全面绘制乳腺癌蛋白群落的层次结构并从中识别潜在的生物标志物是乳腺癌研究的一种很有前途的途径。现有的方法是主观的，没有考虑到蛋白质序列的信息。深度学习可以自动从蛋白质序列和蛋白质-蛋白质相互作用中学习特征，用于分层聚类。结果：利用大量公开可用的蛋白质组学数据，我们使用一种新的层次图神经网络，在基因本体术语的监督和预训练的深度上下文语言模型的帮助下，为乳腺癌蛋白质社区创建了一个层次树。然后，应用群-套索算法来识别同时承受突变负担和生存负担的蛋白质群落，当特定药物分子靶向时发生显着改变，并显示出癌症依赖性扰动。由此产生的蛋白质群落层次图显示了基因水平的突变和生存信息如何在不同尺度上汇聚到蛋白质群落中。通过将BRCA2趋同为乳腺癌热点，建立了模型的内部有效性。与乳腺癌细胞依赖性的进一步重叠表明，SUPT6H和RAD21以及它们各自的蛋白质系统HOST:37和HOST:861是潜在的生物标志物。利用HOST:37和HOST:861基因集的基因水平扰动数据，选择3种fda批准的具有较高治疗价值的药物作为潜在治疗药物进行进一步评价。这些药物包括巯基嘌呤、吡格列酮和秋水仙碱。结论：本文提出的图神经网络方法在层次结构中分析乳腺癌蛋白群落，为乳腺癌的预后和治疗提供了新的视角。通过靶向整个基因集，我们能够在不同水平上评估基因（或基因集）的预后和治疗价值，从基因水平到系统水平生物学。癌症特异性基因依赖性为精确定位癌症相关系统提供了额外的背景，药物诱导的改变可以突出潜在的治疗靶点。这些已确定的蛋白质群落，与其他具有强突变和生存负担的蛋白质群落一起，可能被用作乳腺癌的临床生物标志物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.