HDFS Heterogeneous Storage Resource Management Based on Data Temperature

2015 International Conference on Cloud and Autonomic Computing Pub Date : 2015-09-21 DOI:10.1109/ICCAC.2015.33

Rohith Subramanyam

{"title":"HDFS Heterogeneous Storage Resource Management Based on Data Temperature","authors":"Rohith Subramanyam","doi":"10.1109/ICCAC.2015.33","DOIUrl":null,"url":null,"abstract":"Hadoop has traditionally been used as a large-scale batch processing system. However, interactive applications such as Facebook Messenger are becoming increasingly prominent in the Hadoop world. A key bottleneck in adapting Hadoop to real-time processing is disk data transfer rate. The advent of Solid State Drives (SSDs) holds great promise in this regard as they provide bandwidth on the orders of magnitude better than that of rotating disks. But due to their higher cost per gigabyte, a common approach is to have heterogeneous storage types. This paper presents a Storage Resource Management technique that automatically and dynamically moves data across this tiered storage based on Data Temperature, migrating \"hot\" data towards faster storage and \"cold\" data towards inexpensive archival storage. Thus, the cluster adapts based on the characteristics of the workloads over time to make effective use of the scarce expensive storage. Finally, I evaluate my modified version of the Hadoop Distributed File System (HDFS) against the vanilla version to compare their performances. The results are promising and show an improvement in both read and write performance with a significant improvement in read performance.","PeriodicalId":133491,"journal":{"name":"2015 International Conference on Cloud and Autonomic Computing","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Cloud and Autonomic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAC.2015.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Hadoop has traditionally been used as a large-scale batch processing system. However, interactive applications such as Facebook Messenger are becoming increasingly prominent in the Hadoop world. A key bottleneck in adapting Hadoop to real-time processing is disk data transfer rate. The advent of Solid State Drives (SSDs) holds great promise in this regard as they provide bandwidth on the orders of magnitude better than that of rotating disks. But due to their higher cost per gigabyte, a common approach is to have heterogeneous storage types. This paper presents a Storage Resource Management technique that automatically and dynamically moves data across this tiered storage based on Data Temperature, migrating "hot" data towards faster storage and "cold" data towards inexpensive archival storage. Thus, the cluster adapts based on the characteristics of the workloads over time to make effective use of the scarce expensive storage. Finally, I evaluate my modified version of the Hadoop Distributed File System (HDFS) against the vanilla version to compare their performances. The results are promising and show an improvement in both read and write performance with a significant improvement in read performance.

查看原文本刊更多论文

基于数据温度的HDFS异构存储资源管理

Hadoop传统上被用作大规模批处理系统。然而，像Facebook Messenger这样的交互式应用程序在Hadoop世界中变得越来越突出。使Hadoop适应实时处理的一个关键瓶颈是磁盘数据传输速率。固态硬盘(ssd)的出现在这方面带来了巨大的希望，因为它们提供的带宽比旋转磁盘好几个数量级。但是由于每千兆字节的成本较高，一种常见的方法是使用异构存储类型。本文提出了一种存储资源管理技术，该技术可以根据数据温度自动动态地在分层存储中移动数据，将“热”数据迁移到更快的存储中，将“冷”数据迁移到便宜的归档存储中。因此，集群将根据工作负载的特征进行调整，以有效地利用稀缺的昂贵存储。最后，我将修改后的Hadoop分布式文件系统(HDFS)与原始版本进行比较，以比较它们的性能。结果很有希望，读性能和写性能都有显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Conference on Cloud and Autonomic Computing

自引率

0.00%

发文量