{"title":"最大有向$(k,\\ well)$(k, r)-Plex的搜索与查询","authors":"Shuohao Gao;Kaiqiang Yu;Shengxin Liu;Cheng Long;Xun Zhou","doi":"10.1109/TKDE.2025.3569755","DOIUrl":null,"url":null,"abstract":"Finding cohesive subgraphs from a directed graph is a fundamental approach to analyze directed graph data. We consider a new model called directed <inline-formula><tex-math>$(k,\\ell )$</tex-math></inline-formula>-plex for a cohesive directed subgraph, which is generalized from the concept of <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex that is only applicable to undirected graphs. Directed <inline-formula><tex-math>$(k,\\ell )$</tex-math></inline-formula>-plex (or DPlex) has the connection requirements on both inbound and outbound directions of each vertex inside, i.e., each vertex disconnects at most <inline-formula><tex-math>$k$</tex-math></inline-formula> vertices and is meanwhile not pointed to by at most <inline-formula><tex-math>$\\ell$</tex-math></inline-formula> vertices. In this paper, we study the maximum DPlex search problem which finds a DPlex with the most vertices. We formally prove the NP-hardness of the problem. We then design a heuristic algorithm called <monospace>DPHeuris</monospace>, which finds a DPlex with the size close to the maximum one and runs practically fast in polynomial time. Furthermore, we propose a branch-and-bound algorithm called <monospace>DPBB</monospace> to find the exact maximum DPlex and develop effective graph reduction strategies for boosting the empirical performance. We also consider the problem of querying personalized maximum DPlex, and design a new method called <monospace>DPBBQ</monospace> for the problem. Finally, we conduct extensive experiments on real directed graphs. The experimental results show that (1) our heuristic method can quickly find a near-optimal solution and (2) our branch-and-bound method runs up to six orders of magnitude faster than other baselines.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 8","pages":"4743-4757"},"PeriodicalIF":10.4000,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Searching and Querying Maximum Directed $(k,\\\\ell )$(k,ℓ)-Plex\",\"authors\":\"Shuohao Gao;Kaiqiang Yu;Shengxin Liu;Cheng Long;Xun Zhou\",\"doi\":\"10.1109/TKDE.2025.3569755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding cohesive subgraphs from a directed graph is a fundamental approach to analyze directed graph data. We consider a new model called directed <inline-formula><tex-math>$(k,\\\\ell )$</tex-math></inline-formula>-plex for a cohesive directed subgraph, which is generalized from the concept of <inline-formula><tex-math>$k$</tex-math></inline-formula>-plex that is only applicable to undirected graphs. Directed <inline-formula><tex-math>$(k,\\\\ell )$</tex-math></inline-formula>-plex (or DPlex) has the connection requirements on both inbound and outbound directions of each vertex inside, i.e., each vertex disconnects at most <inline-formula><tex-math>$k$</tex-math></inline-formula> vertices and is meanwhile not pointed to by at most <inline-formula><tex-math>$\\\\ell$</tex-math></inline-formula> vertices. In this paper, we study the maximum DPlex search problem which finds a DPlex with the most vertices. We formally prove the NP-hardness of the problem. We then design a heuristic algorithm called <monospace>DPHeuris</monospace>, which finds a DPlex with the size close to the maximum one and runs practically fast in polynomial time. Furthermore, we propose a branch-and-bound algorithm called <monospace>DPBB</monospace> to find the exact maximum DPlex and develop effective graph reduction strategies for boosting the empirical performance. We also consider the problem of querying personalized maximum DPlex, and design a new method called <monospace>DPBBQ</monospace> for the problem. Finally, we conduct extensive experiments on real directed graphs. The experimental results show that (1) our heuristic method can quickly find a near-optimal solution and (2) our branch-and-bound method runs up to six orders of magnitude faster than other baselines.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 8\",\"pages\":\"4743-4757\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2025-03-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11006014/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11006014/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
从有向图中寻找内聚子图是分析有向图数据的基本方法。我们考虑了一个内聚有向子图的有向$(k,\ well)$-plex的新模型,它是由只适用于无向图的$k$-plex的概念推广而来的。有向$(k,\ well)$-plex(或DPlex)对内部每个顶点的入方向和出方向都有连接要求,即每个顶点不连接最多$k$个顶点,同时不被最多$\ well $个顶点指向。本文研究了最大DPlex搜索问题,即寻找顶点最多的DPlex。我们正式证明了这个问题的np -硬度。然后,我们设计了一种启发式算法,称为DPHeuris,它找到一个大小接近最大值的DPlex,并且在多项式时间内运行得非常快。此外,我们提出了一种称为DPBB的分支定界算法来寻找精确的最大DPlex,并开发有效的图约简策略来提高经验性能。同时考虑了个性化最大DPlex的查询问题,并设计了一种新的DPBBQ方法。最后,我们在实有向图上进行了大量的实验。实验结果表明:(1)我们的启发式方法可以快速找到接近最优解;(2)我们的分支定界方法运行速度比其他基线快6个数量级。
On Searching and Querying Maximum Directed $(k,\ell )$(k,ℓ)-Plex
Finding cohesive subgraphs from a directed graph is a fundamental approach to analyze directed graph data. We consider a new model called directed $(k,\ell )$-plex for a cohesive directed subgraph, which is generalized from the concept of $k$-plex that is only applicable to undirected graphs. Directed $(k,\ell )$-plex (or DPlex) has the connection requirements on both inbound and outbound directions of each vertex inside, i.e., each vertex disconnects at most $k$ vertices and is meanwhile not pointed to by at most $\ell$ vertices. In this paper, we study the maximum DPlex search problem which finds a DPlex with the most vertices. We formally prove the NP-hardness of the problem. We then design a heuristic algorithm called DPHeuris, which finds a DPlex with the size close to the maximum one and runs practically fast in polynomial time. Furthermore, we propose a branch-and-bound algorithm called DPBB to find the exact maximum DPlex and develop effective graph reduction strategies for boosting the empirical performance. We also consider the problem of querying personalized maximum DPlex, and design a new method called DPBBQ for the problem. Finally, we conduct extensive experiments on real directed graphs. The experimental results show that (1) our heuristic method can quickly find a near-optimal solution and (2) our branch-and-bound method runs up to six orders of magnitude faster than other baselines.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.