FAST: A Scalable Subgraph Matching Framework over Large Graphs

2022 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2022-09-19 DOI:10.1109/HPEC55821.2022.9926298

Jiezhong He, Zhouyang Liu, Yixing Chen, H. Pan, Zhen Huang, Dongsheng Li

{"title":"FAST: A Scalable Subgraph Matching Framework over Large Graphs","authors":"Jiezhong He, Zhouyang Liu, Yixing Chen, H. Pan, Zhen Huang, Dongsheng Li","doi":"10.1109/HPEC55821.2022.9926298","DOIUrl":null,"url":null,"abstract":"As one of the most fundamental operations in graph analysis, subgraph matching is widely used in various fields such as social network analysis, knowledge graph query, and fraud detection. Due to its NP-complete complexity, sub-graph matching is challenging on large graphs. Previous work is limited on either scalability or the types of queries that can be handled. To address these problems, we propose a fast, scalable subgraph matching framework that consists of filtering, ordering, and enumeration stages. We exploit the parallelism in the filtering stage, and design a learning-based filtering method to remove false matching candidates; propose heuristic constraint and ordering generation methods to improve the matching efficiency; devise a distributed enumeration algorithm that is further optimized with the introduction of graph cache. Our learning- based filtering method delivers over 90% accuracy for basic queries. Compared with Prune.luice, our matching framework achieves 2–8 x speedup in triangle enumeration and up to 3–4 orders of magnitude higher throughput on generic query enumeration. The caching mechanism further boosts the performance by about 1.5 x to 2.5 x on average. Experiments also demonstrate the scalability of our framework.11This work is supported by the Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory (PDL). The grant number is WDZC2020SS00101., 22The source code is available at https://github.com/yixinchen200S/FAST.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"419 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC55821.2022.9926298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

As one of the most fundamental operations in graph analysis, subgraph matching is widely used in various fields such as social network analysis, knowledge graph query, and fraud detection. Due to its NP-complete complexity, sub-graph matching is challenging on large graphs. Previous work is limited on either scalability or the types of queries that can be handled. To address these problems, we propose a fast, scalable subgraph matching framework that consists of filtering, ordering, and enumeration stages. We exploit the parallelism in the filtering stage, and design a learning-based filtering method to remove false matching candidates; propose heuristic constraint and ordering generation methods to improve the matching efficiency; devise a distributed enumeration algorithm that is further optimized with the introduction of graph cache. Our learning- based filtering method delivers over 90% accuracy for basic queries. Compared with Prune.luice, our matching framework achieves 2–8 x speedup in triangle enumeration and up to 3–4 orders of magnitude higher throughput on generic query enumeration. The caching mechanism further boosts the performance by about 1.5 x to 2.5 x on average. Experiments also demonstrate the scalability of our framework.11This work is supported by the Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory (PDL). The grant number is WDZC2020SS00101., 22The source code is available at https://github.com/yixinchen200S/FAST.

查看原文本刊更多论文

FAST:一个大图的可伸缩子图匹配框架

子图匹配作为图分析中最基本的操作之一，广泛应用于社交网络分析、知识图查询、欺诈检测等各个领域。由于其np完全的复杂性，子图匹配在大型图上具有挑战性。以前的工作受限于可伸缩性或可处理的查询类型。为了解决这些问题，我们提出了一个快速、可扩展的子图匹配框架，该框架由过滤、排序和枚举阶段组成。我们利用滤波阶段的并行性，设计了一种基于学习的滤波方法来去除虚假匹配候选;提出启发式约束和排序生成方法，提高匹配效率;设计一种分布式枚举算法，通过引入图缓存进一步优化。我们基于学习的过滤方法为基本查询提供了超过90%的准确率。与Prune相比。Luice，我们的匹配框架在三角形枚举中实现了2-8倍的加速，在通用查询枚举上实现了3-4个数量级的吞吐量提升。缓存机制进一步将性能平均提高1.5到2.5倍。实验还证明了我们的框架的可扩展性。11本研究由并行与分布式处理实验室(PDL)科技开放基金资助。资助号:WDZC2020SS00101。源代码可在https://github.com/yixinchen200S/FAST上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量