{"title":"Understanding and improving the cost of scaling distributed event processing","authors":"Shoaib Akram, M. Marazakis, A. Bilas","doi":"10.1145/2335484.2335516","DOIUrl":null,"url":null,"abstract":"Building scalable back-end infrastructures for data-centric applications is becoming important. Applications used in data-centres have complex, multilayer software stacks and are required to scale to a large number of nodes. Today, there is increased interest in improving the efficiency of such software stacks. In this paper, we examine the efficiency of such a stack used for distributed stream processing, an important application domain. We use a specific streaming system, Borealis [10], and extensively hand-tune the end-to-end data path. We focus on parts of the stack that are related to intra- and inter-node communication and data exchange, a central component of many software stacks. We find that application-independent code in stream processing middleware employs operations for communication that consume significant amount of CPU cycles and are not strictly necessary. We first categorize these operations based on the protocol function they support. We then proceed to remove these operations by producing a functionally equivalent software stack in terms of application processing. Our results show that restructuring the data path achieves up to 5x higher throughput, reduces energy consumption by up to 60% and saves infrastructure cost by up to 40%. Finally, we project that with 1024-core processors per node, stream processing applications will demand up to 2 TBits/s/node of networking throughput.","PeriodicalId":92123,"journal":{"name":"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems","volume":"1 1","pages":"290-301"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2335484.2335516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Building scalable back-end infrastructures for data-centric applications is becoming important. Applications used in data-centres have complex, multilayer software stacks and are required to scale to a large number of nodes. Today, there is increased interest in improving the efficiency of such software stacks. In this paper, we examine the efficiency of such a stack used for distributed stream processing, an important application domain. We use a specific streaming system, Borealis [10], and extensively hand-tune the end-to-end data path. We focus on parts of the stack that are related to intra- and inter-node communication and data exchange, a central component of many software stacks. We find that application-independent code in stream processing middleware employs operations for communication that consume significant amount of CPU cycles and are not strictly necessary. We first categorize these operations based on the protocol function they support. We then proceed to remove these operations by producing a functionally equivalent software stack in terms of application processing. Our results show that restructuring the data path achieves up to 5x higher throughput, reduces energy consumption by up to 60% and saves infrastructure cost by up to 40%. Finally, we project that with 1024-core processors per node, stream processing applications will demand up to 2 TBits/s/node of networking throughput.