{"title":"CASQD: continuous detection of activity-based subgraph pattern queries on dynamic graphs","authors":"J. Mondal, A. Deshpande","doi":"10.1145/2933267.2933316","DOIUrl":null,"url":null,"abstract":"The ability to detect and analyze interesting subgraph patterns on large and dynamic graph-structured data in near-real time is crucial for many applications; example includes anomaly detection in phone call networks, advertisement targeting in social networks, malware detection in file download graphs, and many more. Such patterns often need to reason about how the nodes are connected to each other (i.e., the structural component) as well as how the nodes behave in the network (i.e., the activity component). An example of such an activity-driven subgraph pattern is a clique of users in a social network (the structural predicate), who each have posted more than 10 messages in last 2 hours (the activity-based predicate). In this paper, we present Casqd, a system for continuous detection and analysis of such active subgraph pattern queries over large dynamic graphs. Some of key challenges in executing such queries include: handling a wide variety of user-specified activities of interest, low selectivities of activity-based predicates and the resultant exponential search space, and high ingestion rates. A key abstraction in Casqd is a notion called graph-view, which acts as an independence layer between the query language and the underlying physical representation of the graph and the active attributes. This abstraction is aimed at simplifying the query language, while empowering the query optimizer. Considering the balance between expressibility (i.e., patterns that cover many real-world use cases) and optimizability of such patterns, we primarily focus on efficient continuous detection of the active regular structures (specifically, active cliques, active stars, and active bi-cliques). We develop a series of optimization techniques including model-based neighborhood explorations, lazy evaluation of the activity predicates, neighborhood-based search space pruning, and others, for efficient query evaluation. We perform a thorough comparative study of the execution strategies under various settings, and show that our system is capable of achieving event processing throughputs over 800k/s using a single, powerful machine.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"27 22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2933267.2933316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
The ability to detect and analyze interesting subgraph patterns on large and dynamic graph-structured data in near-real time is crucial for many applications; example includes anomaly detection in phone call networks, advertisement targeting in social networks, malware detection in file download graphs, and many more. Such patterns often need to reason about how the nodes are connected to each other (i.e., the structural component) as well as how the nodes behave in the network (i.e., the activity component). An example of such an activity-driven subgraph pattern is a clique of users in a social network (the structural predicate), who each have posted more than 10 messages in last 2 hours (the activity-based predicate). In this paper, we present Casqd, a system for continuous detection and analysis of such active subgraph pattern queries over large dynamic graphs. Some of key challenges in executing such queries include: handling a wide variety of user-specified activities of interest, low selectivities of activity-based predicates and the resultant exponential search space, and high ingestion rates. A key abstraction in Casqd is a notion called graph-view, which acts as an independence layer between the query language and the underlying physical representation of the graph and the active attributes. This abstraction is aimed at simplifying the query language, while empowering the query optimizer. Considering the balance between expressibility (i.e., patterns that cover many real-world use cases) and optimizability of such patterns, we primarily focus on efficient continuous detection of the active regular structures (specifically, active cliques, active stars, and active bi-cliques). We develop a series of optimization techniques including model-based neighborhood explorations, lazy evaluation of the activity predicates, neighborhood-based search space pruning, and others, for efficient query evaluation. We perform a thorough comparative study of the execution strategies under various settings, and show that our system is capable of achieving event processing throughputs over 800k/s using a single, powerful machine.