Storage and Rack Sensitive Replica Placement Algorithm for Distributed Platform with Data as Files

2020 International Conference on COMmunication Systems & NETworkS (COMSNETS) Pub Date : 2020-01-01 DOI:10.1109/COMSNETS48256.2020.9027494

Vinay Venkataramanachary, Enrique Reveron, Wei Shi

{"title":"Storage and Rack Sensitive Replica Placement Algorithm for Distributed Platform with Data as Files","authors":"Vinay Venkataramanachary, Enrique Reveron, Wei Shi","doi":"10.1109/COMSNETS48256.2020.9027494","DOIUrl":null,"url":null,"abstract":"Distributed File System (DFS) is a key component in cloud and data center networking. Frequent hardware failure and network bottlenecks in the underlying infrastructure degrade the performance significantly. Replica technique provides enhanced fault tolerance by storing multiple replicas of a single data block. In the Hadoop platform, the Hadoop Distributed File System (HDFS) handles data storage and provides replica placement services. The Default HDFS replica engine adopts a simple rack aware policy and is designed to improve fault tolerance by storing data blocks in multiple racks. However, the HDFS replica engine does not consider key performance indicators of data center resources such as rack utilization and node storage utilization. Furthermore, in HDFS data is stored as uniformly divided small-sized blocks, which increases traffic flow during the entire file access, therefore degrading the response time. In this research, we propose a Storage and Rack Sensitive (SRS) replica placement algorithm that aims at improving the rack and storage utilization of data center resources. The proposed algorithm also attempts to optimize traffic flow during file access by storing data as original files instead of small uniform blocks. Experimental results of the proposed SRS algorithm are compared against the default HDFS replica distribution and significant improvement on rack-utilization and storage-utilization were observed. Furthermore, latest literature confirms that the “Data as a File” approach indeed decreases the amount of data flow caused by file access traffic.","PeriodicalId":265871,"journal":{"name":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS48256.2020.9027494","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Distributed File System (DFS) is a key component in cloud and data center networking. Frequent hardware failure and network bottlenecks in the underlying infrastructure degrade the performance significantly. Replica technique provides enhanced fault tolerance by storing multiple replicas of a single data block. In the Hadoop platform, the Hadoop Distributed File System (HDFS) handles data storage and provides replica placement services. The Default HDFS replica engine adopts a simple rack aware policy and is designed to improve fault tolerance by storing data blocks in multiple racks. However, the HDFS replica engine does not consider key performance indicators of data center resources such as rack utilization and node storage utilization. Furthermore, in HDFS data is stored as uniformly divided small-sized blocks, which increases traffic flow during the entire file access, therefore degrading the response time. In this research, we propose a Storage and Rack Sensitive (SRS) replica placement algorithm that aims at improving the rack and storage utilization of data center resources. The proposed algorithm also attempts to optimize traffic flow during file access by storing data as original files instead of small uniform blocks. Experimental results of the proposed SRS algorithm are compared against the default HDFS replica distribution and significant improvement on rack-utilization and storage-utilization were observed. Furthermore, latest literature confirms that the “Data as a File” approach indeed decreases the amount of data flow caused by file access traffic.

查看原文本刊更多论文

数据为文件的分布式平台存储与机架敏感副本放置算法

分布式文件系统(DFS)是云和数据中心网络的关键组成部分。底层基础设施中频繁的硬件故障和网络瓶颈会显著降低性能。副本技术通过存储单个数据块的多个副本来增强容错性。在Hadoop平台中，HDFS (Hadoop Distributed File System)处理数据存储并提供副本放置服务。Default HDFS副本引擎采用简单的机架感知策略，通过将数据块存储在多个机架上来提高容错性。HDFS副本引擎没有考虑机架利用率、节点存储利用率等数据中心资源的关键性能指标。此外，在HDFS中，数据存储为统一划分的小块，这增加了整个文件访问过程中的流量，从而降低了响应时间。在本研究中，我们提出了一种存储和机架敏感(SRS)副本放置算法，旨在提高数据中心资源的机架和存储利用率。该算法还尝试通过将数据存储为原始文件而不是小的统一块来优化文件访问过程中的流量。将提出的SRS算法的实验结果与默认HDFS副本分布进行了比较，观察到在机架利用率和存储利用率方面有显着改善。此外，最新文献证实，“数据即文件”方法确实减少了由文件访问流量引起的数据流量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on COMmunication Systems & NETworkS (COMSNETS)

自引率

0.00%

发文量