{"title":"Hadoop集群中异构感知的数据分布与再平衡方法","authors":"Yuanquan Fan, Weiguo Wu, Haijun Cao, Huo Zhu, Xu Zhao, Wei Wei","doi":"10.1109/ChinaGrid.2012.22","DOIUrl":null,"url":null,"abstract":"The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous. Due to the fact that the input data are split into data blocks with a predefined block size, Hadoop suffers performance degradation during Map phase in heterogeneous cluster. To solve this problem, we propose a heterogeneity-aware data distribution and rebalance method in heterogeneous Hadoop cluster. The method consists of two aspects: 1) performance-aware data distribution, and 2) dynamic data migration. The experimental results indicate that our method can improve the Map performance in heterogeneous cluster. Furthermore, the data locality of the Map task is enhanced as well.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"A Heterogeneity-aware Data Distribution and Rebalance Method in Hadoop Cluster\",\"authors\":\"Yuanquan Fan, Weiguo Wu, Haijun Cao, Huo Zhu, Xu Zhao, Wei Wei\",\"doi\":\"10.1109/ChinaGrid.2012.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous. Due to the fact that the input data are split into data blocks with a predefined block size, Hadoop suffers performance degradation during Map phase in heterogeneous cluster. To solve this problem, we propose a heterogeneity-aware data distribution and rebalance method in heterogeneous Hadoop cluster. The method consists of two aspects: 1) performance-aware data distribution, and 2) dynamic data migration. The experimental results indicate that our method can improve the Map performance in heterogeneous cluster. Furthermore, the data locality of the Map task is enhanced as well.\",\"PeriodicalId\":371382,\"journal\":{\"name\":\"2012 Seventh ChinaGrid Annual Conference\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Seventh ChinaGrid Annual Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ChinaGrid.2012.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Seventh ChinaGrid Annual Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ChinaGrid.2012.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Heterogeneity-aware Data Distribution and Rebalance Method in Hadoop Cluster
The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous. Due to the fact that the input data are split into data blocks with a predefined block size, Hadoop suffers performance degradation during Map phase in heterogeneous cluster. To solve this problem, we propose a heterogeneity-aware data distribution and rebalance method in heterogeneous Hadoop cluster. The method consists of two aspects: 1) performance-aware data distribution, and 2) dynamic data migration. The experimental results indicate that our method can improve the Map performance in heterogeneous cluster. Furthermore, the data locality of the Map task is enhanced as well.