Incremental graph pattern matching

IF 1.7 2区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Database Systems Pub Date : 2013-08-01 DOI:10.1145/2489791

Wenfei Fan, Xin Wang, Yinghui Wu

{"title":"Incremental graph pattern matching","authors":"Wenfei Fan, Xin Wang, Yinghui Wu","doi":"10.1145/2489791","DOIUrl":null,"url":null,"abstract":"Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms of subgraph isomorphism or graph simulation. These notions, however, often impose too strong a topological constraint on graphs to identify meaningful matches. Second, in practice a graph is typically large, and is frequently updated with small changes. It is often prohibitively expensive to recompute matches starting from scratch via batch algorithms when the graph is updated.\n This article studies these two issues. (1) We propose to define graph pattern matching based on a notion of bounded simulation, which extends graph simulation by specifying the connectivity of nodes in a graph within a predefined number of hops. We show that bounded simulation is able to find sensible matches that the traditional matching notions fail to catch. We also show that matching via bounded simulation is in cubic time, by giving such an algorithm. (2) We provide an account of results on incremental graph pattern matching, for matching defined with graph simulation, bounded simulation, and subgraph isomorphism. We show that the incremental matching problem is unbounded, that is, its cost is not determined alone by the size of the changes in the input and output, for all these matching notions. Nonetheless, when matching is defined in terms of simulation or bounded simulation, incremental matching is semibounded, that is, its worst-time complexity is bounded by a polynomial in the size of the changes in the input, output, and auxiliary information that is necessarily maintained to reuse previous computation, and the size of graph patterns. We also develop incremental matching algorithms for graph simulation and bounded simulation, by minimizing unnecessary recomputation. In contrast, matching based on subgraph isomorphism is neither bounded nor semibounded. (3) We experimentally verify the effectiveness and efficiency of these algorithms, and show that: (a) the revised notion of graph pattern matching allows us to identify communities commonly found in real-life networks, and (b) the incremental algorithms substantially outperform their batch counterparts in response to small changes. These suggest a promising framework for real-life graph pattern matching.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"8 1","pages":"18"},"PeriodicalIF":1.7000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"130","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2489791","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 130

Abstract

Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms of subgraph isomorphism or graph simulation. These notions, however, often impose too strong a topological constraint on graphs to identify meaningful matches. Second, in practice a graph is typically large, and is frequently updated with small changes. It is often prohibitively expensive to recompute matches starting from scratch via batch algorithms when the graph is updated. This article studies these two issues. (1) We propose to define graph pattern matching based on a notion of bounded simulation, which extends graph simulation by specifying the connectivity of nodes in a graph within a predefined number of hops. We show that bounded simulation is able to find sensible matches that the traditional matching notions fail to catch. We also show that matching via bounded simulation is in cubic time, by giving such an algorithm. (2) We provide an account of results on incremental graph pattern matching, for matching defined with graph simulation, bounded simulation, and subgraph isomorphism. We show that the incremental matching problem is unbounded, that is, its cost is not determined alone by the size of the changes in the input and output, for all these matching notions. Nonetheless, when matching is defined in terms of simulation or bounded simulation, incremental matching is semibounded, that is, its worst-time complexity is bounded by a polynomial in the size of the changes in the input, output, and auxiliary information that is necessarily maintained to reuse previous computation, and the size of graph patterns. We also develop incremental matching algorithms for graph simulation and bounded simulation, by minimizing unnecessary recomputation. In contrast, matching based on subgraph isomorphism is neither bounded nor semibounded. (3) We experimentally verify the effectiveness and efficiency of these algorithms, and show that: (a) the revised notion of graph pattern matching allows us to identify communities commonly found in real-life networks, and (b) the incremental algorithms substantially outperform their batch counterparts in response to small changes. These suggest a promising framework for real-life graph pattern matching.

查看原文本刊更多论文

增量图模式匹配

图形模式匹配通常用于各种新兴应用，如社会网络分析。这些应用突出了研究以下两个问题的必要性。首先，图模式匹配传统上是根据子图同构或图模拟来定义的。然而，这些概念通常对图施加了过于强烈的拓扑约束，从而无法识别有意义的匹配。其次，在实践中，图通常很大，并且经常更新小的更改。当图更新时，通过批处理算法从头开始重新计算匹配通常是非常昂贵的。本文对这两个问题进行了研究。(1)我们提出了基于有界模拟的概念来定义图模式匹配，它通过在预定义的跳数内指定图中节点的连通性来扩展图模拟。我们证明了有界模拟能够找到传统匹配概念无法捕获的合理匹配。通过给出这样的算法，我们还证明了通过有界模拟的匹配是在三次时间内完成的。(2)给出了图模拟、有界模拟和子图同构定义的增量图模式匹配的结果。我们证明了增量匹配问题是无界的，也就是说，对于所有这些匹配概念，它的成本并不仅仅取决于输入和输出变化的大小。然而，当匹配被定义为模拟或有界模拟时，增量匹配是半有界的，也就是说，它的最坏时间复杂度被输入、输出和辅助信息的变化大小的多项式和图模式的大小所限制，这些信息是为了重用以前的计算而必须维护的。通过最小化不必要的重新计算，我们还开发了图形模拟和有界模拟的增量匹配算法。相反，基于子图同构的匹配既不是有界的，也不是半有界的。(3)我们通过实验验证了这些算法的有效性和效率，并表明:(a)修订后的图模式匹配概念使我们能够识别现实生活网络中常见的社区，以及(b)增量算法在响应小变化方面大大优于批量算法。这为现实生活中的图形模式匹配提供了一个有前景的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Database Systems 工程技术-计算机：软件工程

CiteScore

5.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.