检测高度相关的实时数据流

Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis, M. Sharaf, Alexandros Labrinidis
{"title":"检测高度相关的实时数据流","authors":"Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis, M. Sharaf, Alexandros Labrinidis","doi":"10.1145/3129292.3129298","DOIUrl":null,"url":null,"abstract":"More and more organizations (commercial, health, government and security) currently base their decisions on real-time analysis of fast arriving, large volumes of data streams. For such analysis to lead to actionable information in real-time and at the right time, the most recent data needs to be processed within a specified delay target. Effective solutions for analysis of such data streams rely on two techniques, (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations and (2) intelligent scheduling of computational steps and operations. In this paper, we propose a solution that combines both of these techniques to find highly correlated data streams in real-time, using the Pearson Correlation Coefficient as a correlation metric for two windows of data streams. Specifically, we propose to partition a set of data streams into micro-batches that capture the delay target, use sliding windows within a range as the subsequences of values exhibiting a certain level of correlation, utilize the idea of sufficient statistics to incrementally compute the Pearson Correlation Coefficient of pairs of sliding windows, and adopt a deadline-aware priority scheduling to detect the highly correlated pairs of data streams. Our experimental results show that our scheme and in particular our Price-DCS with warm start scheduling algorithm outperform existing ones and enable high degree of interactivity in correlating live data streams micro-batches.","PeriodicalId":407894,"journal":{"name":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Detection of Highly Correlated Live Data Streams\",\"authors\":\"Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis, M. Sharaf, Alexandros Labrinidis\",\"doi\":\"10.1145/3129292.3129298\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"More and more organizations (commercial, health, government and security) currently base their decisions on real-time analysis of fast arriving, large volumes of data streams. For such analysis to lead to actionable information in real-time and at the right time, the most recent data needs to be processed within a specified delay target. Effective solutions for analysis of such data streams rely on two techniques, (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations and (2) intelligent scheduling of computational steps and operations. In this paper, we propose a solution that combines both of these techniques to find highly correlated data streams in real-time, using the Pearson Correlation Coefficient as a correlation metric for two windows of data streams. Specifically, we propose to partition a set of data streams into micro-batches that capture the delay target, use sliding windows within a range as the subsequences of values exhibiting a certain level of correlation, utilize the idea of sufficient statistics to incrementally compute the Pearson Correlation Coefficient of pairs of sliding windows, and adopt a deadline-aware priority scheduling to detect the highly correlated pairs of data streams. Our experimental results show that our scheme and in particular our Price-DCS with warm start scheduling algorithm outperform existing ones and enable high degree of interactivity in correlating live data streams micro-batches.\",\"PeriodicalId\":407894,\"journal\":{\"name\":\"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3129292.3129298\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3129292.3129298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

目前,越来越多的组织(商业、卫生、政府和安全)基于对快速到达的大量数据流的实时分析来做出决策。为了使这种分析在正确的时间实时产生可操作的信息,需要在指定的延迟目标内处理最新的数据。分析此类数据流的有效解决方案依赖于两种技术:(1)增量滑动窗口计算聚合,以避免不必要的重新计算;(2)计算步骤和操作的智能调度。在本文中,我们提出了一种结合这两种技术的解决方案,使用Pearson相关系数作为数据流两个窗口的相关度量,实时发现高度相关的数据流。具体而言,我们建议将一组数据流划分为捕获延迟目标的微批,使用一定范围内的滑动窗口作为具有一定相关性的值的子序列,利用充分统计的思想增量计算滑动窗口对的Pearson相关系数,并采用截止日期感知优先级调度来检测高度相关的数据流对。实验结果表明,我们的方案,特别是我们的带有热启动调度算法的Price-DCS优于现有的方案,并在关联微批实时数据流方面实现了高度的交互性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detection of Highly Correlated Live Data Streams
More and more organizations (commercial, health, government and security) currently base their decisions on real-time analysis of fast arriving, large volumes of data streams. For such analysis to lead to actionable information in real-time and at the right time, the most recent data needs to be processed within a specified delay target. Effective solutions for analysis of such data streams rely on two techniques, (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations and (2) intelligent scheduling of computational steps and operations. In this paper, we propose a solution that combines both of these techniques to find highly correlated data streams in real-time, using the Pearson Correlation Coefficient as a correlation metric for two windows of data streams. Specifically, we propose to partition a set of data streams into micro-batches that capture the delay target, use sliding windows within a range as the subsequences of values exhibiting a certain level of correlation, utilize the idea of sufficient statistics to incrementally compute the Pearson Correlation Coefficient of pairs of sliding windows, and adopt a deadline-aware priority scheduling to detect the highly correlated pairs of data streams. Our experimental results show that our scheme and in particular our Price-DCS with warm start scheduling algorithm outperform existing ones and enable high degree of interactivity in correlating live data streams micro-batches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信