{"title":"Preventing Data Popularity Concentration in HDFS based Cloud Storage","authors":"T. Shwe, M. Aritsugi","doi":"10.1145/3368235.3368843","DOIUrl":null,"url":null,"abstract":"Hadoop Distributed File System(HDFS) often experiences skew in data storage over time, mainly because of random data block allocation policy, datanode failure, replica reconstruction, and client activity, leading to utilization and load imbalance in the system. Although HDFS provides tools to rebalance the data in the cluster, balancer only considers balancing disk space utilization among nodes which re-allocates the data from highly utilized nodes to low utilized nodes. Thus, data access skew which is caused by piling a large amount of popular data in one node is not addressed in the default HDFS balancer. To address this issue, we present popularity-aware balancer based on node popularity score which spreads the popular data uniformly among datanodes, resulting in the balance of future access load balancing and reduction of hot spots in the cloud storage system. Simulation results demonstrate the promising benefits of proposed popularity-aware balancer by evaluating the uniform distribution of popular data across nodes without compromising the amount of data transfers and variance in disk space.","PeriodicalId":166357,"journal":{"name":"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368235.3368843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Hadoop Distributed File System(HDFS) often experiences skew in data storage over time, mainly because of random data block allocation policy, datanode failure, replica reconstruction, and client activity, leading to utilization and load imbalance in the system. Although HDFS provides tools to rebalance the data in the cluster, balancer only considers balancing disk space utilization among nodes which re-allocates the data from highly utilized nodes to low utilized nodes. Thus, data access skew which is caused by piling a large amount of popular data in one node is not addressed in the default HDFS balancer. To address this issue, we present popularity-aware balancer based on node popularity score which spreads the popular data uniformly among datanodes, resulting in the balance of future access load balancing and reduction of hot spots in the cloud storage system. Simulation results demonstrate the promising benefits of proposed popularity-aware balancer by evaluating the uniform distribution of popular data across nodes without compromising the amount of data transfers and variance in disk space.