{"title":"全局Shuffle分组(GSG): Storm上连续范围查询的负载均衡策略","authors":"Yuqi Zhang, Botao Wang, Jianpeng Zhou, Hanhui Zhong, Xiao Tian","doi":"10.1109/SERA.2018.8477194","DOIUrl":null,"url":null,"abstract":"Apache Storm is a distributed stream processing framework to support real-time processing of big data. Even if many stream grouping strategies have been implemented in Storm to partition stream data in order to maximize usability of resources, but they cannot efficiently support continuous range query. It is the basis of location based services, in which both queries and objects are moving. The reason is that the spatial semantics of the query (range and data distribution) cannot be expressed by those strategies, and this is easy to result in load imbalance. For this problem, we propose a load-balancing strategy called global shuffle grouping (GSG) to support efficient continuous range queries on Storm. There the cost of the query is estimated based on the range and density of moving objects. The continuous range queries are grouped according to their costs by the way of round-robin. For the queries belonging to the same group, they are distributed according to a counter array by another round-robin. Double round-robins ensure that the load distributions to multiple downstream bolts are balanced. We implemented continuous range query topology with GSG into Storm. Compared with the most practicable built-in grouping strategy shuffle grouping, our proposed grouping is able to reduce load imbalance degree and load standard deviation by 2–3 times and reduce load fluctuation by 1–2 times. The throughput can be improved up to nearly 20%.","PeriodicalId":161568,"journal":{"name":"2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Global Shuffle Grouping (GSG): A Load Balancing Strategy for Continuous Range Queries on Storm\",\"authors\":\"Yuqi Zhang, Botao Wang, Jianpeng Zhou, Hanhui Zhong, Xiao Tian\",\"doi\":\"10.1109/SERA.2018.8477194\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Apache Storm is a distributed stream processing framework to support real-time processing of big data. Even if many stream grouping strategies have been implemented in Storm to partition stream data in order to maximize usability of resources, but they cannot efficiently support continuous range query. It is the basis of location based services, in which both queries and objects are moving. The reason is that the spatial semantics of the query (range and data distribution) cannot be expressed by those strategies, and this is easy to result in load imbalance. For this problem, we propose a load-balancing strategy called global shuffle grouping (GSG) to support efficient continuous range queries on Storm. There the cost of the query is estimated based on the range and density of moving objects. The continuous range queries are grouped according to their costs by the way of round-robin. For the queries belonging to the same group, they are distributed according to a counter array by another round-robin. Double round-robins ensure that the load distributions to multiple downstream bolts are balanced. We implemented continuous range query topology with GSG into Storm. Compared with the most practicable built-in grouping strategy shuffle grouping, our proposed grouping is able to reduce load imbalance degree and load standard deviation by 2–3 times and reduce load fluctuation by 1–2 times. The throughput can be improved up to nearly 20%.\",\"PeriodicalId\":161568,\"journal\":{\"name\":\"2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SERA.2018.8477194\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA.2018.8477194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Global Shuffle Grouping (GSG): A Load Balancing Strategy for Continuous Range Queries on Storm
Apache Storm is a distributed stream processing framework to support real-time processing of big data. Even if many stream grouping strategies have been implemented in Storm to partition stream data in order to maximize usability of resources, but they cannot efficiently support continuous range query. It is the basis of location based services, in which both queries and objects are moving. The reason is that the spatial semantics of the query (range and data distribution) cannot be expressed by those strategies, and this is easy to result in load imbalance. For this problem, we propose a load-balancing strategy called global shuffle grouping (GSG) to support efficient continuous range queries on Storm. There the cost of the query is estimated based on the range and density of moving objects. The continuous range queries are grouped according to their costs by the way of round-robin. For the queries belonging to the same group, they are distributed according to a counter array by another round-robin. Double round-robins ensure that the load distributions to multiple downstream bolts are balanced. We implemented continuous range query topology with GSG into Storm. Compared with the most practicable built-in grouping strategy shuffle grouping, our proposed grouping is able to reduce load imbalance degree and load standard deviation by 2–3 times and reduce load fluctuation by 1–2 times. The throughput can be improved up to nearly 20%.