A Twig-Based Algorithm for Top-k Subgraph Matching in Large-Scale Graph Data

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research Pub Date : 2022-11-28 DOI:10.1016/j.bdr.2022.100350

Haiwei Zhang , Qijie Bai , Yining Lian , Yanlong Wen

{"title":"A Twig-Based Algorithm for Top-k Subgraph Matching in Large-Scale Graph Data","authors":"Haiwei Zhang , Qijie Bai , Yining Lian , Yanlong Wen","doi":"10.1016/j.bdr.2022.100350","DOIUrl":null,"url":null,"abstract":"<div>Subgraph matching aims to find similar substructures in a single graph according to a given query graph and is known as a basic query for graph data management. There exist many categories of subgraph matching solutions. Subgraph isomorphism, which is thought of an NP-complete problem, is an initial solution for the subgraph matching task. To speed up the procedure, graph simulation has been presented to match subgraphs with a polynomial complexity of time. Unfortunately, graph simulation usually loses topologies of matched subgraphs because of its loose restrictions. In this paper, we propose an approximation approach named kSGM (top-k Subraph Graph Matching) for subgraph matching based on twig patterns. First, we transform query graphs into twig patterns and match candidate substructures in graph data. Second, we present an optimized join strategy along with top-k mechanism, including join order selection based on cost evaluation and optimized pruning based on maximum/minimum possible score. Finally, we design experiments on real-life and synthetic graph data to evaluate the performance of our work. The results show that our proposed kSGM obviously reduces the time complexity and guarantee the correctness for answering the queries of subgraph matching compared to existing algorithms.</div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100350"},"PeriodicalIF":3.5000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579622000442","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Subgraph matching aims to find similar substructures in a single graph according to a given query graph and is known as a basic query for graph data management. There exist many categories of subgraph matching solutions. Subgraph isomorphism, which is thought of an NP-complete problem, is an initial solution for the subgraph matching task. To speed up the procedure, graph simulation has been presented to match subgraphs with a polynomial complexity of time. Unfortunately, graph simulation usually loses topologies of matched subgraphs because of its loose restrictions. In this paper, we propose an approximation approach named kSGM (top-k Subraph Graph Matching) for subgraph matching based on twig patterns. First, we transform query graphs into twig patterns and match candidate substructures in graph data. Second, we present an optimized join strategy along with top-k mechanism, including join order selection based on cost evaluation and optimized pruning based on maximum/minimum possible score. Finally, we design experiments on real-life and synthetic graph data to evaluate the performance of our work. The results show that our proposed kSGM obviously reduces the time complexity and guarantee the correctness for answering the queries of subgraph matching compared to existing algorithms.

查看原文本刊更多论文

基于小枝的大规模图数据Top-k子图匹配算法

子图匹配的目的是根据给定的查询图在单个图中找到相似的子结构，是图数据管理的基本查询。子图匹配解有很多种类型。子图同构是子图匹配问题的初始解，被认为是np完全问题。为了加快子图匹配的速度，提出了以多项式时间复杂度匹配子图的图模拟方法。不幸的是，图模拟由于其宽松的限制，通常会丢失匹配子图的拓扑结构。本文提出了一种基于小枝模式的子图匹配的近似方法kSGM (top-k subgraph Matching)。首先，我们将查询图转换为小枝模式，并在图数据中匹配候选子结构。其次，我们提出了一种基于top-k机制的优化连接策略，包括基于成本评估的连接顺序选择和基于最大/最小可能分数的优化修剪。最后，我们设计了真实和合成图形数据的实验来评估我们工作的性能。结果表明，与现有算法相比，我们提出的kSGM算法明显降低了时间复杂度，保证了回答子图匹配查询的正确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Big Data Research Computer Science-Computer Science Applications

CiteScore

8.40

自引率

3.00%

发文量

期刊介绍： The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.