Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database

Lucas Braun, Thomas Etter, Georgios Gasparis, Martin Kaufmann, Donald Kossmann, Daniel Widmer, Aharon Avitzur, A. Iliopoulos, Eliezer Levy, Ning Liang
{"title":"Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database","authors":"Lucas Braun, Thomas Etter, Georgios Gasparis, Martin Kaufmann, Donald Kossmann, Daniel Widmer, Aharon Avitzur, A. Iliopoulos, Eliezer Levy, Ning Liang","doi":"10.1145/2723372.2742783","DOIUrl":null,"url":null,"abstract":"Modern data-centric flows in the telecommunications industry require real time analytical processing over a rapidly changing and large dataset. The traditional approach of separating OLTP and OLAP workloads cannot satisfy this requirement. Instead, a new class of integrated solutions for handling hybrid workloads is needed. This paper presents an industrial use case and a novel architecture that integrates key-value-based event processing and SQL-based analytical processing on the same distributed store while minimizing the total cost of ownership. Our approach combines several well-known techniques such as shared scans, delta processing, a PAX-fashioned storage layout, and an interleaving of scanning and delta merging in a completely new way. Performance experiments show that our system scales out linearly with the number of servers. For instance, our system sustains event streams of 100,000 events per second while simultaneously processing 100 ad-hoc analytical queries per second, using a cluster of 12 commodity servers. In doing so, our system meets all response time goals of our telecommunication customers; that is, 10 milliseconds per event and 100 milliseconds for an ad-hoc analytical query. Moreover, our system beats commercial competitors by a factor of 2.5 in analytical and two orders of magnitude in update performance.","PeriodicalId":168391,"journal":{"name":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723372.2742783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 48

Abstract

Modern data-centric flows in the telecommunications industry require real time analytical processing over a rapidly changing and large dataset. The traditional approach of separating OLTP and OLAP workloads cannot satisfy this requirement. Instead, a new class of integrated solutions for handling hybrid workloads is needed. This paper presents an industrial use case and a novel architecture that integrates key-value-based event processing and SQL-based analytical processing on the same distributed store while minimizing the total cost of ownership. Our approach combines several well-known techniques such as shared scans, delta processing, a PAX-fashioned storage layout, and an interleaving of scanning and delta merging in a completely new way. Performance experiments show that our system scales out linearly with the number of servers. For instance, our system sustains event streams of 100,000 events per second while simultaneously processing 100 ad-hoc analytical queries per second, using a cluster of 12 commodity servers. In doing so, our system meets all response time goals of our telecommunication customers; that is, 10 milliseconds per event and 100 milliseconds for an ad-hoc analytical query. Moreover, our system beats commercial competitors by a factor of 2.5 in analytical and two orders of magnitude in update performance.
动态分析:同一数据库中的高性能事件处理和实时分析
现代电信行业以数据为中心的流程需要对快速变化的大型数据集进行实时分析处理。传统的分离OLTP和OLAP工作负载的方法无法满足这一需求。相反,需要一类新的集成解决方案来处理混合工作负载。本文提出了一个工业用例和一个新颖的体系结构,该体系结构在同一分布式存储上集成了基于键值的事件处理和基于sql的分析处理,同时最大限度地降低了总拥有成本。我们的方法结合了几种众所周知的技术,如共享扫描、增量处理、pax风格的存储布局,以及扫描和增量合并的交错,以一种全新的方式。性能实验表明,我们的系统随服务器数量呈线性扩展。例如,我们的系统使用由12台商品服务器组成的集群,维持每秒100,000个事件的事件流,同时每秒处理100个临时分析查询。因此,我们的系统符合电讯客户的所有响应时间目标;也就是说,每个事件10毫秒,临时分析查询100毫秒。此外,我们的系统在分析性能上比商业竞争对手高出2.5倍,在更新性能上高出两个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信