FlatFIT: Accelerated Incremental Sliding-Window Aggregation For Real-Time Analytics

Proceedings of the 29th International Conference on Scientific and Statistical Database Management Pub Date : 2017-06-27 DOI:10.1145/3085504.3085509

Anatoli U. Shein, Panos K. Chrysanthis, Alexandros Labrinidis

{"title":"FlatFIT: Accelerated Incremental Sliding-Window Aggregation For Real-Time Analytics","authors":"Anatoli U. Shein, Panos K. Chrysanthis, Alexandros Labrinidis","doi":"10.1145/3085504.3085509","DOIUrl":null,"url":null,"abstract":"Data stream processing is becoming essential in most current advanced scientific or business applications as data production rates are increasing. Different companies compete to efficiently ingest high velocity data and apply some form of computation in order to make better business decisions. In order to successfully compete in this environment, companies are focusing on the most recent data within a count or time-based window by continuously executing aggregate queries on it. Incremental sliding-window computation is commonly used to avoid the performance implications of re-evaluating the aggregate value of the window from scratch on every update. The state-of-the-art FlatFAT technique executes ACQs with high efficiency but it does not scale well with the increasing workloads. In this paper we propose a novel algorithm, FlatFIT, that accelerates such calculations by intelligently maintaining index structures, leading to higher reuse of intermediate calculations and thus exceptional scalability in systems with heavy workloads. Our theoretical analysis shows that FlatFIT is superior in both time and space complexities compared to FlatFAT, while maintaining the same query generality. Given a window of size n, FlatFIT achieves constant algorithmic complexity compared to O(log(n)) complexity of FlatFAT. We experimentally show that FlatFIT achieves up to a 17x throughput improvement over FlatFAT for the same input workload while using less memory.","PeriodicalId":431308,"journal":{"name":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","volume":"298 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3085504.3085509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Data stream processing is becoming essential in most current advanced scientific or business applications as data production rates are increasing. Different companies compete to efficiently ingest high velocity data and apply some form of computation in order to make better business decisions. In order to successfully compete in this environment, companies are focusing on the most recent data within a count or time-based window by continuously executing aggregate queries on it. Incremental sliding-window computation is commonly used to avoid the performance implications of re-evaluating the aggregate value of the window from scratch on every update. The state-of-the-art FlatFAT technique executes ACQs with high efficiency but it does not scale well with the increasing workloads. In this paper we propose a novel algorithm, FlatFIT, that accelerates such calculations by intelligently maintaining index structures, leading to higher reuse of intermediate calculations and thus exceptional scalability in systems with heavy workloads. Our theoretical analysis shows that FlatFIT is superior in both time and space complexities compared to FlatFAT, while maintaining the same query generality. Given a window of size n, FlatFIT achieves constant algorithmic complexity compared to O(log(n)) complexity of FlatFAT. We experimentally show that FlatFIT achieves up to a 17x throughput improvement over FlatFAT for the same input workload while using less memory.

查看原文本刊更多论文

FlatFIT:加速增量滑动窗口聚合实时分析

随着数据产生速率的增加，数据流处理在当前大多数先进的科学或商业应用中变得至关重要。不同的公司竞相高效地获取高速数据，并应用某种形式的计算，以便做出更好的业务决策。为了在这种环境中成功竞争，公司通过不断地对其执行聚合查询来关注计数或基于时间的窗口内的最新数据。增量滑动窗口计算通常用于避免在每次更新时从头开始重新评估窗口的聚合值所带来的性能影响。最先进的FlatFAT技术以高效率执行acq，但它不能很好地随工作负载的增加而扩展。在本文中，我们提出了一种新的算法FlatFIT，它通过智能地维护索引结构来加速这种计算，从而导致中间计算的更高重用，从而在具有繁重工作负载的系统中具有出色的可伸缩性。我们的理论分析表明，与FlatFAT相比，FlatFIT在时间和空间复杂性方面都优于FlatFAT，同时保持相同的查询通用性。给定大小为n的窗口，FlatFIT实现恒定的算法复杂度，而FlatFAT的复杂度为O(log(n))。我们通过实验证明，在使用更少内存的情况下，对于相同的输入工作负载，FlatFIT比FlatFAT实现了高达17倍的吞吐量改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 29th International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量