{"title":"Boosting energy efficiency with mirrored data block replication policy and energy scheduler","authors":"Sara Arbab Yazd, S. Venkatesan, N. Mittal","doi":"10.1145/2506164.2506171","DOIUrl":null,"url":null,"abstract":"Energy efficiency is one of the major challenges in big datacenters. To facilitate processing of large data sets in a distributed fashion, the MapReduce programming model is employed in these datacenters. Hadoop is an open-source implementation of MapReduce which contains a distributed file system. Hadoop Distributed File System provides a data block replication scheme to preserve reliability and data availability. The distribution of the data block replicas over the nodes is performed randomly by meeting some constraints (e.g., preventing storage of two replicas of a data block on a single node). This study makes use of flexibility in the data block placement policy to increase energy efficiency in datacenters. Furthermore, inspired by Zaharia et al.'s delay scheduling algorithm, a scheduling algorithm is introduced, which takes into account energy efficiency in addition to fairness and data locality properties. Computer simulations of the proposed method suggest its superiority over Hadoop's standard settings.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGOPS Oper. Syst. Rev.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2506164.2506171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Energy efficiency is one of the major challenges in big datacenters. To facilitate processing of large data sets in a distributed fashion, the MapReduce programming model is employed in these datacenters. Hadoop is an open-source implementation of MapReduce which contains a distributed file system. Hadoop Distributed File System provides a data block replication scheme to preserve reliability and data availability. The distribution of the data block replicas over the nodes is performed randomly by meeting some constraints (e.g., preventing storage of two replicas of a data block on a single node). This study makes use of flexibility in the data block placement policy to increase energy efficiency in datacenters. Furthermore, inspired by Zaharia et al.'s delay scheduling algorithm, a scheduling algorithm is introduced, which takes into account energy efficiency in addition to fairness and data locality properties. Computer simulations of the proposed method suggest its superiority over Hadoop's standard settings.