{"title":"延迟笛卡尔积的高效子图匹配","authors":"Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, W. Zhang","doi":"10.1145/2882903.2915236","DOIUrl":null,"url":null,"abstract":"In this paper, we study the problem of subgraph matching that extracts all subgraph isomorphic embeddings of a query graph q in a large data graph G. The existing algorithms for subgraph matching follow Ullmann's backtracking approach; that is, iteratively map query vertices to data vertices by following a matching order of query vertices. It has been shown that the matching order of query vertices is a very important aspect to the efficiency of a subgraph matching algorithm. Recently, many advanced techniques, such as enforcing connectivity and merging similar vertices in query or data graphs, have been proposed to provide an effective matching order with the aim to reduce unpromising intermediate results especially the ones caused by redundant Cartesian products. In this paper, for the first time we address the issue of unpromising results by Cartesian products from \"dissimilar\" vertices. We propose a new framework by postponing the Cartesian products based on the structure of a query to minimize the redundant Cartesian products. Our second contribution is proposing a new path-based auxiliary data structure, with the size O(|E(G)| x |V(q)|), to generate a matching order and conduct subgraph matching, which significantly reduces the exponential size O(|V(G)||V(q)|-1) of the existing path-based auxiliary data structure, where V (G) and E (G) are the vertex and edge sets of a data graph G, respectively, and V (q) is the vertex set of a query $q$. Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms by up to $3$ orders of magnitude.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"516 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"205","resultStr":"{\"title\":\"Efficient Subgraph Matching by Postponing Cartesian Products\",\"authors\":\"Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, W. Zhang\",\"doi\":\"10.1145/2882903.2915236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the problem of subgraph matching that extracts all subgraph isomorphic embeddings of a query graph q in a large data graph G. The existing algorithms for subgraph matching follow Ullmann's backtracking approach; that is, iteratively map query vertices to data vertices by following a matching order of query vertices. It has been shown that the matching order of query vertices is a very important aspect to the efficiency of a subgraph matching algorithm. Recently, many advanced techniques, such as enforcing connectivity and merging similar vertices in query or data graphs, have been proposed to provide an effective matching order with the aim to reduce unpromising intermediate results especially the ones caused by redundant Cartesian products. In this paper, for the first time we address the issue of unpromising results by Cartesian products from \\\"dissimilar\\\" vertices. We propose a new framework by postponing the Cartesian products based on the structure of a query to minimize the redundant Cartesian products. Our second contribution is proposing a new path-based auxiliary data structure, with the size O(|E(G)| x |V(q)|), to generate a matching order and conduct subgraph matching, which significantly reduces the exponential size O(|V(G)||V(q)|-1) of the existing path-based auxiliary data structure, where V (G) and E (G) are the vertex and edge sets of a data graph G, respectively, and V (q) is the vertex set of a query $q$. Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms by up to $3$ orders of magnitude.\",\"PeriodicalId\":20483,\"journal\":{\"name\":\"Proceedings of the 2016 International Conference on Management of Data\",\"volume\":\"516 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"205\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2882903.2915236\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2882903.2915236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 205
摘要
本文研究了在大数据图g中提取查询图q的所有子图同构嵌入的子图匹配问题,现有的子图匹配算法采用Ullmann回溯法;也就是说,通过遵循查询顶点的匹配顺序,迭代地将查询顶点映射到数据顶点。研究表明,查询顶点的匹配顺序是影响子图匹配算法效率的一个重要方面。近年来,人们提出了许多先进的技术,如在查询图或数据图中加强连通性和合并相似顶点,以提供有效的匹配顺序,以减少不希望的中间结果,特别是由冗余笛卡尔积引起的中间结果。在本文中,我们首次解决了由“不相似”顶点的笛卡尔积产生的不乐观结果的问题。我们提出了一种基于查询结构的延迟笛卡尔积的新框架,以最小化冗余的笛卡尔积。我们的第二个贡献是提出了一个新的基于路径的辅助数据结构,其大小为O(|E(G)| x |V(q)|),用于生成匹配顺序并进行子图匹配,这大大减少了现有基于路径的辅助数据结构的指数大小O(|V(G)||V(q)|-1),其中V(G)和E(G)分别是数据图G的顶点集和边集,V(q)是查询$q$的顶点集。对真实图和合成图的广泛实证研究表明,我们的技术比最先进的算法性能高出3个数量级。
Efficient Subgraph Matching by Postponing Cartesian Products
In this paper, we study the problem of subgraph matching that extracts all subgraph isomorphic embeddings of a query graph q in a large data graph G. The existing algorithms for subgraph matching follow Ullmann's backtracking approach; that is, iteratively map query vertices to data vertices by following a matching order of query vertices. It has been shown that the matching order of query vertices is a very important aspect to the efficiency of a subgraph matching algorithm. Recently, many advanced techniques, such as enforcing connectivity and merging similar vertices in query or data graphs, have been proposed to provide an effective matching order with the aim to reduce unpromising intermediate results especially the ones caused by redundant Cartesian products. In this paper, for the first time we address the issue of unpromising results by Cartesian products from "dissimilar" vertices. We propose a new framework by postponing the Cartesian products based on the structure of a query to minimize the redundant Cartesian products. Our second contribution is proposing a new path-based auxiliary data structure, with the size O(|E(G)| x |V(q)|), to generate a matching order and conduct subgraph matching, which significantly reduces the exponential size O(|V(G)||V(q)|-1) of the existing path-based auxiliary data structure, where V (G) and E (G) are the vertex and edge sets of a data graph G, respectively, and V (q) is the vertex set of a query $q$. Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms by up to $3$ orders of magnitude.