{"title":"理论与实践高效并行核分解(摘要)","authors":"Jessica Shi, Laxman Dhulipala, Julian Shun","doi":"10.1145/3597635.3598024","DOIUrl":null,"url":null,"abstract":"Discovering dense substructures in graphs is a fundamental topic in graph mining, and has been studied across many areas including computational biology, spam and fraud-detection, and large-scale network analysis. Recently, Sariyuce et al. introduced the nucleus decomposition problem, which generalizes the influential notions of k-cores and k-trusses to k-(r,s) nucleii, and can better capture higher-order structures. Informally, a k-(r,s) nucleus is the maximal induced subgraph such that every r-clique in the subgraph is contained in at least k s-cliques. The goal of the (r, s) nucleus decomposition problem is to identify for each r-clique in the graph, the largest k such that it is in a k-(r,s) nucleus. Solving the (r, s) nucleus decomposition problem is a significant computational challenge for several reasons. First, simply counting and enumerating s-cliques is a challenging task, even for modest s. Second, storing information for all r-cliques can require a large amount of space, even for relatively small graphs. Third, engineering fast and high-performance solutions to this problem requires taking advantage of parallelism due to the computationally-intensive nature of listing cliques. There are two well-known parallel paradigms for approaching the (r, s) nucleus decomposition problem, a global peeling-based model and a local update model that iterates until convergence. The former is inherently challenging to parallelize due to sequential dependencies and necessary synchronization steps, which we address in this paper, and we demonstrate that the latter requires orders of magnitude more work to converge to the same solution and is thus less performant.","PeriodicalId":185981,"journal":{"name":"Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Theoretically and Practically Efficient Parallel Nucleus Decomposition (Abstract)\",\"authors\":\"Jessica Shi, Laxman Dhulipala, Julian Shun\",\"doi\":\"10.1145/3597635.3598024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Discovering dense substructures in graphs is a fundamental topic in graph mining, and has been studied across many areas including computational biology, spam and fraud-detection, and large-scale network analysis. Recently, Sariyuce et al. introduced the nucleus decomposition problem, which generalizes the influential notions of k-cores and k-trusses to k-(r,s) nucleii, and can better capture higher-order structures. Informally, a k-(r,s) nucleus is the maximal induced subgraph such that every r-clique in the subgraph is contained in at least k s-cliques. The goal of the (r, s) nucleus decomposition problem is to identify for each r-clique in the graph, the largest k such that it is in a k-(r,s) nucleus. Solving the (r, s) nucleus decomposition problem is a significant computational challenge for several reasons. First, simply counting and enumerating s-cliques is a challenging task, even for modest s. Second, storing information for all r-cliques can require a large amount of space, even for relatively small graphs. Third, engineering fast and high-performance solutions to this problem requires taking advantage of parallelism due to the computationally-intensive nature of listing cliques. There are two well-known parallel paradigms for approaching the (r, s) nucleus decomposition problem, a global peeling-based model and a local update model that iterates until convergence. The former is inherently challenging to parallelize due to sequential dependencies and necessary synchronization steps, which we address in this paper, and we demonstrate that the latter requires orders of magnitude more work to converge to the same solution and is thus less performant.\",\"PeriodicalId\":185981,\"journal\":{\"name\":\"Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3597635.3598024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3597635.3598024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Theoretically and Practically Efficient Parallel Nucleus Decomposition (Abstract)
Discovering dense substructures in graphs is a fundamental topic in graph mining, and has been studied across many areas including computational biology, spam and fraud-detection, and large-scale network analysis. Recently, Sariyuce et al. introduced the nucleus decomposition problem, which generalizes the influential notions of k-cores and k-trusses to k-(r,s) nucleii, and can better capture higher-order structures. Informally, a k-(r,s) nucleus is the maximal induced subgraph such that every r-clique in the subgraph is contained in at least k s-cliques. The goal of the (r, s) nucleus decomposition problem is to identify for each r-clique in the graph, the largest k such that it is in a k-(r,s) nucleus. Solving the (r, s) nucleus decomposition problem is a significant computational challenge for several reasons. First, simply counting and enumerating s-cliques is a challenging task, even for modest s. Second, storing information for all r-cliques can require a large amount of space, even for relatively small graphs. Third, engineering fast and high-performance solutions to this problem requires taking advantage of parallelism due to the computationally-intensive nature of listing cliques. There are two well-known parallel paradigms for approaching the (r, s) nucleus decomposition problem, a global peeling-based model and a local update model that iterates until convergence. The former is inherently challenging to parallelize due to sequential dependencies and necessary synchronization steps, which we address in this paper, and we demonstrate that the latter requires orders of magnitude more work to converge to the same solution and is thus less performant.