Yared Dejene Dessalk, Nikolay Nikolov, M. Matskin, A. Soylu, D. Roman
{"title":"Scalable Execution of Big Data Workflows using Software Containers","authors":"Yared Dejene Dessalk, Nikolay Nikolov, M. Matskin, A. Soylu, D. Roman","doi":"10.1145/3415958.3433082","DOIUrl":null,"url":null,"abstract":"Big Data processing involves handling large and complex data sets, incorporating different tools and frameworks as well as other processes that help organisations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, require taking advantage of the elasticity of cloud infrastructures for scalability. In this paper, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies and message-oriented middleware (MOM) to enable highly scalable workflow execution. The approach is demonstrated in a use case together with a set of experiments that demonstrate the practical applicability of the proposed approach for the scalable execution of Big Data workflows. Furthermore, we present a scalability comparison of our proposed approach with that of Argo Workflows - one of the most prominent tools in the area of Big Data workflows.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3415958.3433082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Big Data processing involves handling large and complex data sets, incorporating different tools and frameworks as well as other processes that help organisations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, require taking advantage of the elasticity of cloud infrastructures for scalability. In this paper, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies and message-oriented middleware (MOM) to enable highly scalable workflow execution. The approach is demonstrated in a use case together with a set of experiments that demonstrate the practical applicability of the proposed approach for the scalable execution of Big Data workflows. Furthermore, we present a scalability comparison of our proposed approach with that of Argo Workflows - one of the most prominent tools in the area of Big Data workflows.