SINCA:使用集群操作符的可伸缩内存事件聚合

2015 31st IEEE International Conference on Data Engineering Workshops Pub Date : 2015-04-13 DOI:10.1109/ICDEW.2015.7129578

M. K. Behera, S. Kalyan, Prasanna Venkatesh, A. Wolski

{"title":"SINCA:使用集群操作符的可伸缩内存事件聚合","authors":"M. K. Behera, S. Kalyan, Prasanna Venkatesh, A. Wolski","doi":"10.1109/ICDEW.2015.7129578","DOIUrl":null,"url":null,"abstract":"Analytical processing of various information created in the operation of social media requires queries involving grouping and aggregating of large volumes of detail data. Any advanced query processing method should take into account two dominating hardware trends: increasing main memory capacities and increasing parallel processing capacity exposed as growing number of cores per processor chip. We introduce a scalable in-memory method for data aggregation (SINCA), using clustered operators, which profits from the hardware trends. The method uses a concept of a microengine being a set of resources that can be utilized in parallel, with great efficiency. The resulting parallelized aggregation algorithm is characterized by a low overhead and high volume, and is suitable to both real-time and extract-transform-load scenarios. The core idea of the method is to use real-time histograms to partition the data for grouping. As the data is already grouped during the partitioning phase, the group aggregation can be done very efficiently. Additionally, some of the grouped data can be cached for re-use in subsequent queries.","PeriodicalId":333151,"journal":{"name":"2015 31st IEEE International Conference on Data Engineering Workshops","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SINCA: Scalable in-memory event aggregation using clustered operators\",\"authors\":\"M. K. Behera, S. Kalyan, Prasanna Venkatesh, A. Wolski\",\"doi\":\"10.1109/ICDEW.2015.7129578\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analytical processing of various information created in the operation of social media requires queries involving grouping and aggregating of large volumes of detail data. Any advanced query processing method should take into account two dominating hardware trends: increasing main memory capacities and increasing parallel processing capacity exposed as growing number of cores per processor chip. We introduce a scalable in-memory method for data aggregation (SINCA), using clustered operators, which profits from the hardware trends. The method uses a concept of a microengine being a set of resources that can be utilized in parallel, with great efficiency. The resulting parallelized aggregation algorithm is characterized by a low overhead and high volume, and is suitable to both real-time and extract-transform-load scenarios. The core idea of the method is to use real-time histograms to partition the data for grouping. As the data is already grouped during the partitioning phase, the group aggregation can be done very efficiently. Additionally, some of the grouped data can be cached for re-use in subsequent queries.\",\"PeriodicalId\":333151,\"journal\":{\"name\":\"2015 31st IEEE International Conference on Data Engineering Workshops\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 31st IEEE International Conference on Data Engineering Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDEW.2015.7129578\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 31st IEEE International Conference on Data Engineering Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2015.7129578","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

对社交媒体运营中产生的各种信息进行分析处理，需要对大量细节数据进行分组和聚合。任何高级查询处理方法都应该考虑到两个主要的硬件趋势:增加主存储器容量和增加并行处理能力，因为每个处理器芯片的内核数量不断增加。我们引入了一种可扩展的内存数据聚合方法(SINCA)，该方法使用集群运算符，从硬件趋势中获益。该方法使用了一个概念，即微引擎是一组可以并行利用的资源，效率很高。所得到的并行化聚合算法具有低开销、大容量的特点，适用于实时和提取-转换负载场景。该方法的核心思想是利用实时直方图对数据进行分组。由于在分区阶段已经对数据进行了分组，因此可以非常有效地进行组聚合。此外，可以缓存一些分组数据，以便在后续查询中重用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SINCA: Scalable in-memory event aggregation using clustered operators

Analytical processing of various information created in the operation of social media requires queries involving grouping and aggregating of large volumes of detail data. Any advanced query processing method should take into account two dominating hardware trends: increasing main memory capacities and increasing parallel processing capacity exposed as growing number of cores per processor chip. We introduce a scalable in-memory method for data aggregation (SINCA), using clustered operators, which profits from the hardware trends. The method uses a concept of a microengine being a set of resources that can be utilized in parallel, with great efficiency. The resulting parallelized aggregation algorithm is characterized by a low overhead and high volume, and is suitable to both real-time and extract-transform-load scenarios. The core idea of the method is to use real-time histograms to partition the data for grouping. As the data is already grouped during the partitioning phase, the group aggregation can be done very efficiently. Additionally, some of the grouped data can be cached for re-use in subsequent queries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 31st IEEE International Conference on Data Engineering Workshops

自引率

0.00%

发文量