SINCA:使用集群操作符的可伸缩内存事件聚合

M. K. Behera, S. Kalyan, Prasanna Venkatesh, A. Wolski
{"title":"SINCA:使用集群操作符的可伸缩内存事件聚合","authors":"M. K. Behera, S. Kalyan, Prasanna Venkatesh, A. Wolski","doi":"10.1109/ICDEW.2015.7129578","DOIUrl":null,"url":null,"abstract":"Analytical processing of various information created in the operation of social media requires queries involving grouping and aggregating of large volumes of detail data. Any advanced query processing method should take into account two dominating hardware trends: increasing main memory capacities and increasing parallel processing capacity exposed as growing number of cores per processor chip. We introduce a scalable in-memory method for data aggregation (SINCA), using clustered operators, which profits from the hardware trends. The method uses a concept of a microengine being a set of resources that can be utilized in parallel, with great efficiency. The resulting parallelized aggregation algorithm is characterized by a low overhead and high volume, and is suitable to both real-time and extract-transform-load scenarios. The core idea of the method is to use real-time histograms to partition the data for grouping. As the data is already grouped during the partitioning phase, the group aggregation can be done very efficiently. Additionally, some of the grouped data can be cached for re-use in subsequent queries.","PeriodicalId":333151,"journal":{"name":"2015 31st IEEE International Conference on Data Engineering Workshops","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SINCA: Scalable in-memory event aggregation using clustered operators\",\"authors\":\"M. K. Behera, S. Kalyan, Prasanna Venkatesh, A. Wolski\",\"doi\":\"10.1109/ICDEW.2015.7129578\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analytical processing of various information created in the operation of social media requires queries involving grouping and aggregating of large volumes of detail data. Any advanced query processing method should take into account two dominating hardware trends: increasing main memory capacities and increasing parallel processing capacity exposed as growing number of cores per processor chip. We introduce a scalable in-memory method for data aggregation (SINCA), using clustered operators, which profits from the hardware trends. The method uses a concept of a microengine being a set of resources that can be utilized in parallel, with great efficiency. The resulting parallelized aggregation algorithm is characterized by a low overhead and high volume, and is suitable to both real-time and extract-transform-load scenarios. The core idea of the method is to use real-time histograms to partition the data for grouping. As the data is already grouped during the partitioning phase, the group aggregation can be done very efficiently. Additionally, some of the grouped data can be cached for re-use in subsequent queries.\",\"PeriodicalId\":333151,\"journal\":{\"name\":\"2015 31st IEEE International Conference on Data Engineering Workshops\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 31st IEEE International Conference on Data Engineering Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDEW.2015.7129578\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 31st IEEE International Conference on Data Engineering Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2015.7129578","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

对社交媒体运营中产生的各种信息进行分析处理,需要对大量细节数据进行分组和聚合。任何高级查询处理方法都应该考虑到两个主要的硬件趋势:增加主存储器容量和增加并行处理能力,因为每个处理器芯片的内核数量不断增加。我们引入了一种可扩展的内存数据聚合方法(SINCA),该方法使用集群运算符,从硬件趋势中获益。该方法使用了一个概念,即微引擎是一组可以并行利用的资源,效率很高。所得到的并行化聚合算法具有低开销、大容量的特点,适用于实时和提取-转换负载场景。该方法的核心思想是利用实时直方图对数据进行分组。由于在分区阶段已经对数据进行了分组,因此可以非常有效地进行组聚合。此外,可以缓存一些分组数据,以便在后续查询中重用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SINCA: Scalable in-memory event aggregation using clustered operators
Analytical processing of various information created in the operation of social media requires queries involving grouping and aggregating of large volumes of detail data. Any advanced query processing method should take into account two dominating hardware trends: increasing main memory capacities and increasing parallel processing capacity exposed as growing number of cores per processor chip. We introduce a scalable in-memory method for data aggregation (SINCA), using clustered operators, which profits from the hardware trends. The method uses a concept of a microengine being a set of resources that can be utilized in parallel, with great efficiency. The resulting parallelized aggregation algorithm is characterized by a low overhead and high volume, and is suitable to both real-time and extract-transform-load scenarios. The core idea of the method is to use real-time histograms to partition the data for grouping. As the data is already grouped during the partitioning phase, the group aggregation can be done very efficiently. Additionally, some of the grouped data can be cached for re-use in subsequent queries.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信