Storage and Rack Sensitive Replica Placement Algorithm for Distributed Platform with Data as Files

Vinay Venkataramanachary, Enrique Reveron, Wei Shi
{"title":"Storage and Rack Sensitive Replica Placement Algorithm for Distributed Platform with Data as Files","authors":"Vinay Venkataramanachary, Enrique Reveron, Wei Shi","doi":"10.1109/COMSNETS48256.2020.9027494","DOIUrl":null,"url":null,"abstract":"Distributed File System (DFS) is a key component in cloud and data center networking. Frequent hardware failure and network bottlenecks in the underlying infrastructure degrade the performance significantly. Replica technique provides enhanced fault tolerance by storing multiple replicas of a single data block. In the Hadoop platform, the Hadoop Distributed File System (HDFS) handles data storage and provides replica placement services. The Default HDFS replica engine adopts a simple rack aware policy and is designed to improve fault tolerance by storing data blocks in multiple racks. However, the HDFS replica engine does not consider key performance indicators of data center resources such as rack utilization and node storage utilization. Furthermore, in HDFS data is stored as uniformly divided small-sized blocks, which increases traffic flow during the entire file access, therefore degrading the response time. In this research, we propose a Storage and Rack Sensitive (SRS) replica placement algorithm that aims at improving the rack and storage utilization of data center resources. The proposed algorithm also attempts to optimize traffic flow during file access by storing data as original files instead of small uniform blocks. Experimental results of the proposed SRS algorithm are compared against the default HDFS replica distribution and significant improvement on rack-utilization and storage-utilization were observed. Furthermore, latest literature confirms that the “Data as a File” approach indeed decreases the amount of data flow caused by file access traffic.","PeriodicalId":265871,"journal":{"name":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS48256.2020.9027494","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Distributed File System (DFS) is a key component in cloud and data center networking. Frequent hardware failure and network bottlenecks in the underlying infrastructure degrade the performance significantly. Replica technique provides enhanced fault tolerance by storing multiple replicas of a single data block. In the Hadoop platform, the Hadoop Distributed File System (HDFS) handles data storage and provides replica placement services. The Default HDFS replica engine adopts a simple rack aware policy and is designed to improve fault tolerance by storing data blocks in multiple racks. However, the HDFS replica engine does not consider key performance indicators of data center resources such as rack utilization and node storage utilization. Furthermore, in HDFS data is stored as uniformly divided small-sized blocks, which increases traffic flow during the entire file access, therefore degrading the response time. In this research, we propose a Storage and Rack Sensitive (SRS) replica placement algorithm that aims at improving the rack and storage utilization of data center resources. The proposed algorithm also attempts to optimize traffic flow during file access by storing data as original files instead of small uniform blocks. Experimental results of the proposed SRS algorithm are compared against the default HDFS replica distribution and significant improvement on rack-utilization and storage-utilization were observed. Furthermore, latest literature confirms that the “Data as a File” approach indeed decreases the amount of data flow caused by file access traffic.
数据为文件的分布式平台存储与机架敏感副本放置算法
分布式文件系统(DFS)是云和数据中心网络的关键组成部分。底层基础设施中频繁的硬件故障和网络瓶颈会显著降低性能。副本技术通过存储单个数据块的多个副本来增强容错性。在Hadoop平台中,HDFS (Hadoop Distributed File System)处理数据存储并提供副本放置服务。Default HDFS副本引擎采用简单的机架感知策略,通过将数据块存储在多个机架上来提高容错性。HDFS副本引擎没有考虑机架利用率、节点存储利用率等数据中心资源的关键性能指标。此外,在HDFS中,数据存储为统一划分的小块,这增加了整个文件访问过程中的流量,从而降低了响应时间。在本研究中,我们提出了一种存储和机架敏感(SRS)副本放置算法,旨在提高数据中心资源的机架和存储利用率。该算法还尝试通过将数据存储为原始文件而不是小的统一块来优化文件访问过程中的流量。将提出的SRS算法的实验结果与默认HDFS副本分布进行了比较,观察到在机架利用率和存储利用率方面有显着改善。此外,最新文献证实,“数据即文件”方法确实减少了由文件访问流量引起的数据流量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信