{"title":"Enabling A Load Adaptive Distributed Stream Processing Platform on Synchronized Clusters","authors":"Xing Wu, Yan Liu","doi":"10.1109/IC2E.2014.87","DOIUrl":null,"url":null,"abstract":"Distributed stream processing (DSP) platforms enable simplified development of applications that can process continuous unbounded streams of data at a high speed. Leveraging large scale cluster management frameworks, DSP can scale to analyze data in real-time with different types of operators, each running on a cluster node. The scalability and resource utilization depend on the allocation of operators on clusters. Since the data volume and rate can be unpredictable, static mapping between operators and cluster resources results in unbalanced operator load distribution. This paper proposes a software layer that is load-adaptive between a DSP platform and clusters. It allows dynamic transferring of an operator to different cluster nodes at runtime and keeps the process transparent to developers. We present a prototype implemented on Yahoo's S4. Our implementation is evaluated by a top-N topic list application on Twitter streams. The results demonstrate improved stream processing throughputs and cluster resource utilization.","PeriodicalId":273902,"journal":{"name":"2014 IEEE International Conference on Cloud Engineering","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Cloud Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2E.2014.87","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Distributed stream processing (DSP) platforms enable simplified development of applications that can process continuous unbounded streams of data at a high speed. Leveraging large scale cluster management frameworks, DSP can scale to analyze data in real-time with different types of operators, each running on a cluster node. The scalability and resource utilization depend on the allocation of operators on clusters. Since the data volume and rate can be unpredictable, static mapping between operators and cluster resources results in unbalanced operator load distribution. This paper proposes a software layer that is load-adaptive between a DSP platform and clusters. It allows dynamic transferring of an operator to different cluster nodes at runtime and keeps the process transparent to developers. We present a prototype implemented on Yahoo's S4. Our implementation is evaluated by a top-N topic list application on Twitter streams. The results demonstrate improved stream processing throughputs and cluster resource utilization.