Global Shuffle Grouping (GSG): A Load Balancing Strategy for Continuous Range Queries on Storm

2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA) Pub Date : 2018-06-01 DOI:10.1109/SERA.2018.8477194

Yuqi Zhang, Botao Wang, Jianpeng Zhou, Hanhui Zhong, Xiao Tian

{"title":"Global Shuffle Grouping (GSG): A Load Balancing Strategy for Continuous Range Queries on Storm","authors":"Yuqi Zhang, Botao Wang, Jianpeng Zhou, Hanhui Zhong, Xiao Tian","doi":"10.1109/SERA.2018.8477194","DOIUrl":null,"url":null,"abstract":"Apache Storm is a distributed stream processing framework to support real-time processing of big data. Even if many stream grouping strategies have been implemented in Storm to partition stream data in order to maximize usability of resources, but they cannot efficiently support continuous range query. It is the basis of location based services, in which both queries and objects are moving. The reason is that the spatial semantics of the query (range and data distribution) cannot be expressed by those strategies, and this is easy to result in load imbalance. For this problem, we propose a load-balancing strategy called global shuffle grouping (GSG) to support efficient continuous range queries on Storm. There the cost of the query is estimated based on the range and density of moving objects. The continuous range queries are grouped according to their costs by the way of round-robin. For the queries belonging to the same group, they are distributed according to a counter array by another round-robin. Double round-robins ensure that the load distributions to multiple downstream bolts are balanced. We implemented continuous range query topology with GSG into Storm. Compared with the most practicable built-in grouping strategy shuffle grouping, our proposed grouping is able to reduce load imbalance degree and load standard deviation by 2–3 times and reduce load fluctuation by 1–2 times. The throughput can be improved up to nearly 20%.","PeriodicalId":161568,"journal":{"name":"2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA.2018.8477194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Apache Storm is a distributed stream processing framework to support real-time processing of big data. Even if many stream grouping strategies have been implemented in Storm to partition stream data in order to maximize usability of resources, but they cannot efficiently support continuous range query. It is the basis of location based services, in which both queries and objects are moving. The reason is that the spatial semantics of the query (range and data distribution) cannot be expressed by those strategies, and this is easy to result in load imbalance. For this problem, we propose a load-balancing strategy called global shuffle grouping (GSG) to support efficient continuous range queries on Storm. There the cost of the query is estimated based on the range and density of moving objects. The continuous range queries are grouped according to their costs by the way of round-robin. For the queries belonging to the same group, they are distributed according to a counter array by another round-robin. Double round-robins ensure that the load distributions to multiple downstream bolts are balanced. We implemented continuous range query topology with GSG into Storm. Compared with the most practicable built-in grouping strategy shuffle grouping, our proposed grouping is able to reduce load imbalance degree and load standard deviation by 2–3 times and reduce load fluctuation by 1–2 times. The throughput can be improved up to nearly 20%.

查看原文本刊更多论文

全局Shuffle分组(GSG): Storm上连续范围查询的负载均衡策略

Apache Storm是一个分布式流处理框架，支持大数据的实时处理。尽管在Storm中实现了许多流分组策略来对流数据进行分区，以最大限度地提高资源的可用性，但它们不能有效地支持连续范围查询。它是基于位置的服务的基础，其中查询和对象都在移动。原因是这些策略无法表达查询的空间语义(范围和数据分布)，这很容易导致负载不平衡。针对这个问题，我们提出了一种全局洗牌分组(global shuffle grouping, GSG)负载均衡策略来支持Storm上高效的连续范围查询。在这里，查询的开销是基于移动对象的范围和密度来估计的。将连续范围查询按照开销进行分组，采用轮询的方式。对于属于同一组的查询，它们由另一个轮询根据计数器数组进行分发。双轮循确保多个下游螺栓的负载分布是平衡的。我们在Storm中使用GSG实现了连续距离查询拓扑。与目前最实用的内置分组策略shuffle分组相比，本文提出的分组能够将负载不平衡程度和负载标准差降低2-3倍，将负载波动降低1-2倍。吞吐量可提高近20%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA)

自引率

0.00%

发文量