Yang Liu, C. Wu, M. Wang, Aiqin Hou, Yongqiang Wang
{"title":"On a Dynamic Data Placement Strategy for Heterogeneous Hadoop Clusters","authors":"Yang Liu, C. Wu, M. Wang, Aiqin Hou, Yongqiang Wang","doi":"10.1109/ISNCC.2018.8530970","DOIUrl":null,"url":null,"abstract":"Hadoop is one of the most popular distributed systems for big data computing in both industry and science communities. The default data placement strategy of Hadoop Distributed File System (HDFS), which was initially designed for homogenous environments, may suffer from performance degradation when deployed in heterogeneous clusters comprised of data nodes with disparate computing power and disk capacity, hence undermining the performance of MapReduce applications. In this paper, we use a Grey Forecast model to predict data hotness dynamically and determine an appropriate number of data block replicas on the fly. Based on such information, we further propose a dynamic data placement strategy (DDPS) to decide the best location for new replicas according to their hotness. The proposed method is able to dynamically adjust data replicas stored on each node in a heterogeneous Hadoop cluster and reduce the response time of big data applications. Experimental results on a heterogeneous Hadoop cluster show that DDPS together with the prediction model significantly increases application execution efficiency and improve MapReduce performance over the default HDFS configuration.","PeriodicalId":313846,"journal":{"name":"2018 International Symposium on Networks, Computers and Communications (ISNCC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Symposium on Networks, Computers and Communications (ISNCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISNCC.2018.8530970","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Hadoop is one of the most popular distributed systems for big data computing in both industry and science communities. The default data placement strategy of Hadoop Distributed File System (HDFS), which was initially designed for homogenous environments, may suffer from performance degradation when deployed in heterogeneous clusters comprised of data nodes with disparate computing power and disk capacity, hence undermining the performance of MapReduce applications. In this paper, we use a Grey Forecast model to predict data hotness dynamically and determine an appropriate number of data block replicas on the fly. Based on such information, we further propose a dynamic data placement strategy (DDPS) to decide the best location for new replicas according to their hotness. The proposed method is able to dynamically adjust data replicas stored on each node in a heterogeneous Hadoop cluster and reduce the response time of big data applications. Experimental results on a heterogeneous Hadoop cluster show that DDPS together with the prediction model significantly increases application execution efficiency and improve MapReduce performance over the default HDFS configuration.