Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis
{"title":"Strategies for Detection of Correlated Data Streams","authors":"Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis","doi":"10.1145/3214708.3214714","DOIUrl":null,"url":null,"abstract":"There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. In this paper we present an effective solution for the analysis of such data streams that is based upon a 3-fold approach that combines (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations, (2) intelligent scheduling of computation steps and operations, driven by a utility function within a micro-batch, and (3) an exploration strategy that tunes the utility function. Specifically, we propose eight strategies that explore correlated pairs of live data streams across consecutive micro-batches. Our experimental evaluation on a real dataset shows that some strategies are more suitable to identifying high numbers of correlated pairs of live data streams, already known from previous micro-batches, while others are more suitable to identifying previously unseen pairs of live data streams across consecutive micro-batches.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3214708.3214714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. In this paper we present an effective solution for the analysis of such data streams that is based upon a 3-fold approach that combines (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations, (2) intelligent scheduling of computation steps and operations, driven by a utility function within a micro-batch, and (3) an exploration strategy that tunes the utility function. Specifically, we propose eight strategies that explore correlated pairs of live data streams across consecutive micro-batches. Our experimental evaluation on a real dataset shows that some strategies are more suitable to identifying high numbers of correlated pairs of live data streams, already known from previous micro-batches, while others are more suitable to identifying previously unseen pairs of live data streams across consecutive micro-batches.