Haiwei Zhang , Qijie Bai , Yining Lian , Yanlong Wen
{"title":"A Twig-Based Algorithm for Top-k Subgraph Matching in Large-Scale Graph Data","authors":"Haiwei Zhang , Qijie Bai , Yining Lian , Yanlong Wen","doi":"10.1016/j.bdr.2022.100350","DOIUrl":null,"url":null,"abstract":"<div><p><span><span><span>Subgraph matching aims to find similar substructures in a single graph according to a given query graph and is known as a basic query for graph data management. There exist many categories of subgraph matching solutions. Subgraph isomorphism, which is thought of an NP-complete problem, is an initial solution for the subgraph matching task. To speed up the procedure, graph simulation has been presented to match subgraphs with a </span>polynomial complexity of time. Unfortunately, graph simulation usually loses topologies of matched subgraphs because of its loose restrictions. In this paper, we propose an </span>approximation approach named kSGM (top-</span><strong>k S</strong>ubraph <strong>G</strong>raph <strong>M</strong>atching) for subgraph matching based on twig patterns. First, we transform query graphs into twig patterns and match candidate substructures in graph data. Second, we present an optimized join strategy along with top-k mechanism, including join order selection based on cost evaluation and optimized pruning based on maximum/minimum possible score. Finally, we design experiments on real-life and synthetic graph data to evaluate the performance of our work. The results show that our proposed kSGM obviously reduces the time complexity and guarantee the correctness for answering the queries of subgraph matching compared to existing algorithms.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100350"},"PeriodicalIF":3.5000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214579622000442","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Subgraph matching aims to find similar substructures in a single graph according to a given query graph and is known as a basic query for graph data management. There exist many categories of subgraph matching solutions. Subgraph isomorphism, which is thought of an NP-complete problem, is an initial solution for the subgraph matching task. To speed up the procedure, graph simulation has been presented to match subgraphs with a polynomial complexity of time. Unfortunately, graph simulation usually loses topologies of matched subgraphs because of its loose restrictions. In this paper, we propose an approximation approach named kSGM (top-k Subraph Graph Matching) for subgraph matching based on twig patterns. First, we transform query graphs into twig patterns and match candidate substructures in graph data. Second, we present an optimized join strategy along with top-k mechanism, including join order selection based on cost evaluation and optimized pruning based on maximum/minimum possible score. Finally, we design experiments on real-life and synthetic graph data to evaluate the performance of our work. The results show that our proposed kSGM obviously reduces the time complexity and guarantee the correctness for answering the queries of subgraph matching compared to existing algorithms.
期刊介绍:
The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic.
The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.