{"title":"T-Storm: Storm中的流量感知在线调度","authors":"Jielong Xu, Zhenhua Chen, Jian Tang, Sen Su","doi":"10.1109/ICDCS.2014.61","DOIUrl":null,"url":null,"abstract":"Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated by our observations, we design and implement a new stream data processing system based on Storm, namely, T-Storm. Compared to Storm, T-Storm has the following desirable features: 1) based on runtime states, it accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic while ensuring no worker nodes are overloaded, 2) it enables fine-grained control over worker node consolidation such that T-Storm can achieve better performance with even fewer worker nodes, 3) it allows hot-swapping of scheduling algorithms and adjustment of scheduling parameters on the fly, and 4) it is transparent to Storm users (i.e., Storm applications can be ported to run on T-Storm without any changes). We conducted real experiments in a cluster using well-known data processing applications for performance evaluation. Extensive experimental results show that compared to Storm (with the default scheduler), T-Storm can achieve over 84% and 27% speedup on lightly and heavily loaded topologies respectively (in terms of average processing time) with 30% less number of worker nodes.","PeriodicalId":170186,"journal":{"name":"2014 IEEE 34th International Conference on Distributed Computing Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"212","resultStr":"{\"title\":\"T-Storm: Traffic-Aware Online Scheduling in Storm\",\"authors\":\"Jielong Xu, Zhenhua Chen, Jian Tang, Sen Su\",\"doi\":\"10.1109/ICDCS.2014.61\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated by our observations, we design and implement a new stream data processing system based on Storm, namely, T-Storm. Compared to Storm, T-Storm has the following desirable features: 1) based on runtime states, it accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic while ensuring no worker nodes are overloaded, 2) it enables fine-grained control over worker node consolidation such that T-Storm can achieve better performance with even fewer worker nodes, 3) it allows hot-swapping of scheduling algorithms and adjustment of scheduling parameters on the fly, and 4) it is transparent to Storm users (i.e., Storm applications can be ported to run on T-Storm without any changes). We conducted real experiments in a cluster using well-known data processing applications for performance evaluation. Extensive experimental results show that compared to Storm (with the default scheduler), T-Storm can achieve over 84% and 27% speedup on lightly and heavily loaded topologies respectively (in terms of average processing time) with 30% less number of worker nodes.\",\"PeriodicalId\":170186,\"journal\":{\"name\":\"2014 IEEE 34th International Conference on Distributed Computing Systems\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"212\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 34th International Conference on Distributed Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS.2014.61\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 34th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2014.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Storm has emerged as a promising computation platform for stream data processing. In this paper, we first show inefficiencies of the current practice of Storm scheduling and challenges associated with applying traffic-aware online scheduling in Storm via experimental results and analysis. Motivated by our observations, we design and implement a new stream data processing system based on Storm, namely, T-Storm. Compared to Storm, T-Storm has the following desirable features: 1) based on runtime states, it accelerates data processing by leveraging effective traffic-aware scheduling for assigning/re-assigning tasks dynamically, which minimizes inter-node and inter-process traffic while ensuring no worker nodes are overloaded, 2) it enables fine-grained control over worker node consolidation such that T-Storm can achieve better performance with even fewer worker nodes, 3) it allows hot-swapping of scheduling algorithms and adjustment of scheduling parameters on the fly, and 4) it is transparent to Storm users (i.e., Storm applications can be ported to run on T-Storm without any changes). We conducted real experiments in a cluster using well-known data processing applications for performance evaluation. Extensive experimental results show that compared to Storm (with the default scheduler), T-Storm can achieve over 84% and 27% speedup on lightly and heavily loaded topologies respectively (in terms of average processing time) with 30% less number of worker nodes.