Mining contextually meaningful subgraphs from a vertex-attributed graph.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2024-11-14 DOI:10.1186/s12859-024-05960-x

Riyad Hakim, Saeed Salem

{"title":"Mining contextually meaningful subgraphs from a vertex-attributed graph.","authors":"Riyad Hakim, Saeed Salem","doi":"10.1186/s12859-024-05960-x","DOIUrl":null,"url":null,"abstract":"<p><p>Networks have emerged as a natural data structure to represent relations among entities. Proteins interact to carry out cellular functions and protein-Protein interaction network analysis has been employed for understanding the cellular machinery. Advances in genomics technologies enabled the collection of large data that annotate proteins in interaction networks. Integrative analysis of interaction networks with gene expression and annotations enables the discovery of context-specific complexes and improves the identification of functional modules and pathways. Extracting subnetworks whose vertices are connected and have high attribute similarity have applications in diverse domains. We present an enumeration approach for mining sets of connected and cohesive subgraphs, where vertices in the subgraphs have similar attribute profile. Due to the large number of cohesive connected subgraphs and to overcome the overlap among these subgraphs, we propose an algorithm for enumerating a set of representative subgraphs, the set of all closed subgraphs. We propose pruning strategies for efficiently enumerating the search tree without missing any pattern or reporting duplicate subgraphs. On a real protein-protein interaction network with attributes representing the dysregulation profile of genes in multiple cancers, we mine closed cohesive connected subnetworks and show their biological significance. Moreover, we conduct a runtime comparison with existing algorithms to show the efficiency of our proposed algorithm.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"356"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566210/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05960-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Networks have emerged as a natural data structure to represent relations among entities. Proteins interact to carry out cellular functions and protein-Protein interaction network analysis has been employed for understanding the cellular machinery. Advances in genomics technologies enabled the collection of large data that annotate proteins in interaction networks. Integrative analysis of interaction networks with gene expression and annotations enables the discovery of context-specific complexes and improves the identification of functional modules and pathways. Extracting subnetworks whose vertices are connected and have high attribute similarity have applications in diverse domains. We present an enumeration approach for mining sets of connected and cohesive subgraphs, where vertices in the subgraphs have similar attribute profile. Due to the large number of cohesive connected subgraphs and to overcome the overlap among these subgraphs, we propose an algorithm for enumerating a set of representative subgraphs, the set of all closed subgraphs. We propose pruning strategies for efficiently enumerating the search tree without missing any pattern or reporting duplicate subgraphs. On a real protein-protein interaction network with attributes representing the dysregulation profile of genes in multiple cancers, we mine closed cohesive connected subnetworks and show their biological significance. Moreover, we conduct a runtime comparison with existing algorithms to show the efficiency of our proposed algorithm.

查看原文本刊更多论文

从顶点属性图中挖掘有上下文意义的子图。

网络已成为表示实体间关系的一种自然数据结构。蛋白质相互作用实现细胞功能，蛋白质-蛋白质相互作用网络分析已被用于了解细胞机制。基因组学技术的进步使得人们能够收集大量数据，将蛋白质注释到相互作用网络中。将相互作用网络与基因表达和注释进行整合分析，可以发现特定的复合体，并改进功能模块和通路的识别。提取顶点相连且属性相似度高的子网络可应用于不同领域。我们提出了一种枚举法，用于挖掘连接且内聚的子图集，其中子图中的顶点具有相似的属性特征。由于内聚连接子图的数量庞大，为了克服这些子图之间的重叠，我们提出了一种枚举代表性子图集合（即所有封闭子图的集合）的算法。我们提出了剪枝策略，以高效地枚举搜索树，而不会遗漏任何模式或报告重复的子图。在一个真实的蛋白质-蛋白质相互作用网络上，我们挖掘出了封闭内聚连接子网，并展示了它们的生物学意义。此外，我们还与现有算法进行了运行时间比较，以显示我们提出的算法的效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.