Shared Execution Techniques for Business Data Analytics over Big Data Streams

Serkan Uzunbaz, Walid G. Aref
{"title":"Shared Execution Techniques for Business Data Analytics over Big Data Streams","authors":"Serkan Uzunbaz, Walid G. Aref","doi":"10.1145/3400903.3400932","DOIUrl":null,"url":null,"abstract":"Business Data Analytics require processing of large numbers of data streams and the creation of materialized views in order to provide near real-time answers to user queries. Materializing the view of each query and refreshing it continuously as a separate query execution plan is not efficient and is not scalable. In this paper, we present a global query execution plan to simultaneously support multiple queries, and minimize the number of input scans, operators, and tuples flowing between the operators. We propose shared-execution techniques for creating and maintaining materialized views in support of business data analytics queries. We utilize commonalities in multiple business data analytics queries to support scalable and efficient processing of big data streams. The paper highlights shared execution techniques for select predicates, group, and aggregate calculations. We present how global query execution plans are run in a distributed stream processing system, called INGA which is built on top of Storm. In INGA, we are able to support online view maintenance of 2500 materialized views using 237 queries by utilizing the shared constructs between the queries. We are able to run all 237 queries using a single global query execution plan tree with depth of 21.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"32nd International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400903.3400932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Business Data Analytics require processing of large numbers of data streams and the creation of materialized views in order to provide near real-time answers to user queries. Materializing the view of each query and refreshing it continuously as a separate query execution plan is not efficient and is not scalable. In this paper, we present a global query execution plan to simultaneously support multiple queries, and minimize the number of input scans, operators, and tuples flowing between the operators. We propose shared-execution techniques for creating and maintaining materialized views in support of business data analytics queries. We utilize commonalities in multiple business data analytics queries to support scalable and efficient processing of big data streams. The paper highlights shared execution techniques for select predicates, group, and aggregate calculations. We present how global query execution plans are run in a distributed stream processing system, called INGA which is built on top of Storm. In INGA, we are able to support online view maintenance of 2500 materialized views using 237 queries by utilizing the shared constructs between the queries. We are able to run all 237 queries using a single global query execution plan tree with depth of 21.
大数据流上业务数据分析的共享执行技术
业务数据分析需要处理大量数据流并创建物化视图,以便为用户查询提供近乎实时的答案。将每个查询的视图具体化并将其作为单独的查询执行计划不断刷新是不高效的,也是不可扩展的。在本文中,我们提出了一个全局查询执行计划,以同时支持多个查询,并最大限度地减少输入扫描、操作符和操作符之间流动的元组的数量。我们建议使用共享执行技术来创建和维护物化视图,以支持业务数据分析查询。我们利用多个业务数据分析查询中的共性来支持大数据流的可扩展和高效处理。本文重点介绍了用于选择谓词、组和聚合计算的共享执行技术。我们展示了全局查询执行计划是如何在一个分布式流处理系统中运行的,这个系统被称为INGA,它建立在Storm之上。在INGA中,通过使用查询之间的共享构造,我们能够使用237个查询支持2500个物化视图的在线视图维护。我们可以使用一个深度为21的全局查询执行计划树来运行所有237个查询。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信