{"title":"Energy-Efficient Dynamic Scheduling of Deadline-Constrained MapReduce Workflows","authors":"Tong Shu, C. Wu","doi":"10.1109/eScience.2017.18","DOIUrl":null,"url":null,"abstract":"Big data workflows comprised of moldable parallel MapReduce programs running on a large number of processors have become a main consumer of energy at data centers. The degree of parallelism of each moldable job in such workflows has a significant impact on the energy efficiency of parallel computing systems, which remains largely unexplored. In this paper, we validate with experimental results the moldable parallel computing model where the dynamic energy consumption of a moldable job increases with the number of parallel tasks. Based on our validation, we construct rigorous cost models and formulate a dynamic scheduling problem of deadline-constrained MapReduce workflows to minimize energy consumption in Hadoop systems. We propose a semi-dynamic online scheduling algorithm based on adaptive task partitioning to reduce dynamic energy consumption while meeting performance requirements from a global perspective, and also design the corresponding system modules for algorithm implementation in Hadoop architecture. The performance superiority of the proposed algorithm in terms of dynamic energy saving and deadline violation is illustrated by extensive simulation results in Hadoop/YARN in comparison with existing algorithms, and the core module of adaptive task partitioning is further validated through real-life workflow implementation and experimental results using the Oozie workflow engine in Hadoop/YARN systems.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"975 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 13th International Conference on e-Science (e-Science)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2017.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Big data workflows comprised of moldable parallel MapReduce programs running on a large number of processors have become a main consumer of energy at data centers. The degree of parallelism of each moldable job in such workflows has a significant impact on the energy efficiency of parallel computing systems, which remains largely unexplored. In this paper, we validate with experimental results the moldable parallel computing model where the dynamic energy consumption of a moldable job increases with the number of parallel tasks. Based on our validation, we construct rigorous cost models and formulate a dynamic scheduling problem of deadline-constrained MapReduce workflows to minimize energy consumption in Hadoop systems. We propose a semi-dynamic online scheduling algorithm based on adaptive task partitioning to reduce dynamic energy consumption while meeting performance requirements from a global perspective, and also design the corresponding system modules for algorithm implementation in Hadoop architecture. The performance superiority of the proposed algorithm in terms of dynamic energy saving and deadline violation is illustrated by extensive simulation results in Hadoop/YARN in comparison with existing algorithms, and the core module of adaptive task partitioning is further validated through real-life workflow implementation and experimental results using the Oozie workflow engine in Hadoop/YARN systems.