{"title":"HDFS Heterogeneous Storage Resource Management Based on Data Temperature","authors":"Rohith Subramanyam","doi":"10.1109/ICCAC.2015.33","DOIUrl":null,"url":null,"abstract":"Hadoop has traditionally been used as a large-scale batch processing system. However, interactive applications such as Facebook Messenger are becoming increasingly prominent in the Hadoop world. A key bottleneck in adapting Hadoop to real-time processing is disk data transfer rate. The advent of Solid State Drives (SSDs) holds great promise in this regard as they provide bandwidth on the orders of magnitude better than that of rotating disks. But due to their higher cost per gigabyte, a common approach is to have heterogeneous storage types. This paper presents a Storage Resource Management technique that automatically and dynamically moves data across this tiered storage based on Data Temperature, migrating \"hot\" data towards faster storage and \"cold\" data towards inexpensive archival storage. Thus, the cluster adapts based on the characteristics of the workloads over time to make effective use of the scarce expensive storage. Finally, I evaluate my modified version of the Hadoop Distributed File System (HDFS) against the vanilla version to compare their performances. The results are promising and show an improvement in both read and write performance with a significant improvement in read performance.","PeriodicalId":133491,"journal":{"name":"2015 International Conference on Cloud and Autonomic Computing","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Cloud and Autonomic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAC.2015.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Hadoop has traditionally been used as a large-scale batch processing system. However, interactive applications such as Facebook Messenger are becoming increasingly prominent in the Hadoop world. A key bottleneck in adapting Hadoop to real-time processing is disk data transfer rate. The advent of Solid State Drives (SSDs) holds great promise in this regard as they provide bandwidth on the orders of magnitude better than that of rotating disks. But due to their higher cost per gigabyte, a common approach is to have heterogeneous storage types. This paper presents a Storage Resource Management technique that automatically and dynamically moves data across this tiered storage based on Data Temperature, migrating "hot" data towards faster storage and "cold" data towards inexpensive archival storage. Thus, the cluster adapts based on the characteristics of the workloads over time to make effective use of the scarce expensive storage. Finally, I evaluate my modified version of the Hadoop Distributed File System (HDFS) against the vanilla version to compare their performances. The results are promising and show an improvement in both read and write performance with a significant improvement in read performance.