T-Storm: Traffic-Aware Online Scheduling in Storm

Jielong Xu, Zhenhua Chen, Jian Tang, Sen Su
{"title":"T-Storm: Traffic-Aware Online Scheduling in Storm","authors":"Jielong Xu, Zhenhua Chen, Jian Tang, Sen Su","doi":"10.1109/ICDCS.2014.61","DOIUrl":null,"url":null,"abstract":"Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated by our observations, we design and implement a new stream data processing system based on Storm, namely, T-Storm. Compared to Storm, T-Storm has the following desirable features: 1) based on runtime states, it accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic while ensuring no worker nodes are overloaded, 2) it enables fine-grained control over worker node consolidation such that T-Storm can achieve better performance with even fewer worker nodes, 3) it allows hot-swapping of scheduling algorithms and adjustment of scheduling parameters on the fly, and 4) it is transparent to Storm users (i.e., Storm applications can be ported to run on T-Storm without any changes). We conducted real experiments in a cluster using well-known data processing applications for performance evaluation. Extensive experimental results show that compared to Storm (with the default scheduler), T-Storm can achieve over 84% and 27% speedup on lightly and heavily loaded topologies respectively (in terms of average processing time) with 30% less number of worker nodes.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"212","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 34th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2014.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 212

Abstract

Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated by our observations, we design and implement a new stream data processing system based on Storm, namely, T-Storm. Compared to Storm, T-Storm has the following desirable features: 1) based on runtime states, it accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic while ensuring no worker nodes are overloaded, 2) it enables fine-grained control over worker node consolidation such that T-Storm can achieve better performance with even fewer worker nodes, 3) it allows hot-swapping of scheduling algorithms and adjustment of scheduling parameters on the fly, and 4) it is transparent to Storm users (i.e., Storm applications can be ported to run on T-Storm without any changes). We conducted real experiments in a cluster using well-known data processing applications for performance evaluation. Extensive experimental results show that compared to Storm (with the default scheduler), T-Storm can achieve over 84% and 27% speedup on lightly and heavily loaded topologies respectively (in terms of average processing time) with 30% less number of worker nodes.
T-Storm: Storm中的流量感知在线调度
Storm已经成为一个很有前途的流数据处理计算平台。在本文中,我们首先通过实验结果和分析显示了当前Storm调度实践的低效率以及在Storm中应用流量感知在线调度相关的挑战。基于我们的观察,我们设计并实现了一个新的基于Storm的流数据处理系统,即T-Storm。与Storm相比,T-Storm有以下可取之处:1)基于运行时状态,它通过利用有效的流量感知调度来动态分配/重新分配任务,从而加速数据处理,从而最大限度地减少节点间和进程间的流量,同时确保没有工作节点过载,2)它可以对工作节点整合进行细粒度控制,这样T-Storm可以在更少的工作节点下实现更好的性能。3)它允许调度算法的热插拔和调度参数的调整,4)它对Storm用户是透明的(即,Storm应用程序可以移植到T-Storm上运行而不需要任何更改)。我们在一个集群中使用知名的数据处理应用程序进行了真实的实验,以进行性能评估。大量的实验结果表明,与Storm(使用默认调度程序)相比,T-Storm在工作节点数量减少30%的情况下,在轻负载和重负载拓扑上分别可以实现超过84%和27%的加速(就平均处理时间而言)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信