Preventing Data Popularity Concentration in HDFS based Cloud Storage

T. Shwe, M. Aritsugi
{"title":"Preventing Data Popularity Concentration in HDFS based Cloud Storage","authors":"T. Shwe, M. Aritsugi","doi":"10.1145/3368235.3368843","DOIUrl":null,"url":null,"abstract":"Hadoop Distributed File System(HDFS) often experiences skew in data storage over time, mainly because of random data block allocation policy, datanode failure, replica reconstruction, and client activity, leading to utilization and load imbalance in the system. Although HDFS provides tools to rebalance the data in the cluster, balancer only considers balancing disk space utilization among nodes which re-allocates the data from highly utilized nodes to low utilized nodes. Thus, data access skew which is caused by piling a large amount of popular data in one node is not addressed in the default HDFS balancer. To address this issue, we present popularity-aware balancer based on node popularity score which spreads the popular data uniformly among datanodes, resulting in the balance of future access load balancing and reduction of hot spots in the cloud storage system. Simulation results demonstrate the promising benefits of proposed popularity-aware balancer by evaluating the uniform distribution of popular data across nodes without compromising the amount of data transfers and variance in disk space.","PeriodicalId":166357,"journal":{"name":"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368235.3368843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Hadoop Distributed File System(HDFS) often experiences skew in data storage over time, mainly because of random data block allocation policy, datanode failure, replica reconstruction, and client activity, leading to utilization and load imbalance in the system. Although HDFS provides tools to rebalance the data in the cluster, balancer only considers balancing disk space utilization among nodes which re-allocates the data from highly utilized nodes to low utilized nodes. Thus, data access skew which is caused by piling a large amount of popular data in one node is not addressed in the default HDFS balancer. To address this issue, we present popularity-aware balancer based on node popularity score which spreads the popular data uniformly among datanodes, resulting in the balance of future access load balancing and reduction of hot spots in the cloud storage system. Simulation results demonstrate the promising benefits of proposed popularity-aware balancer by evaluating the uniform distribution of popular data across nodes without compromising the amount of data transfers and variance in disk space.
防止基于HDFS的云存储中数据热度集中
Hadoop HDFS (Distributed File System)的数据存储随着时间的推移,经常会出现数据存储的倾斜,主要是由于随机的数据块分配策略、datanode故障、副本重建和客户端活动导致系统的利用率和负载不平衡。虽然HDFS提供了重新平衡集群内数据的工具,但balancer只考虑均衡节点间的磁盘空间利用率,将数据从利用率高的节点重新分配到利用率低的节点。因此,在默认的HDFS平衡器中不会解决由于在一个节点上堆积大量流行数据而导致的数据访问倾斜。为了解决这一问题,我们提出了基于节点流行度评分的流行感知均衡器,该均衡器将流行数据统一分布在数据节点之间,从而实现云存储系统未来访问负载均衡和热点减少的平衡。仿真结果表明,通过在不影响数据传输量和磁盘空间方差的情况下评估流行数据跨节点的均匀分布,所提出的流行感知平衡器具有良好的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信