High Performance Design for HDFS with Byte-Addressability of NVM and RDMA

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI:10.1145/2925426.2926290

Nusrat S. Islam, Md. Wasi-ur-Rahman, Xiaoyi Lu, D. Panda

{"title":"High Performance Design for HDFS with Byte-Addressability of NVM and RDMA","authors":"Nusrat S. Islam, Md. Wasi-ur-Rahman, Xiaoyi Lu, D. Panda","doi":"10.1145/2925426.2926290","DOIUrl":null,"url":null,"abstract":"Non-Volatile Memory (NVM) offers byte-addressability with DRAM like performance along with persistence. Thus, NVMs provide the opportunity to build high-throughput storage systems for data-intensive applications. HDFS (Hadoop Distributed File System) is the primary storage engine for MapReduce, Spark, and HBase. Even though HDFS was initially designed for commodity hardware, it is increasingly being used on HPC (High Performance Computing) clusters. The outstanding performance requirements of HPC systems make the I/O bottlenecks of HDFS a critical issue to rethink its storage architecture over NVMs. In this paper, we present a novel design for HDFS to leverage the byte-addressability of NVM for RDMA (Remote Direct Memory Access)-based communication. We analyze the performance potential of using NVM for HDFS and re-design HDFS I/O with memory semantics to exploit the byte-addressability fully. We call this design NVFS (NVM- and RDMA-aware HDFS). We also present cost-effective acceleration techniques for HBase and Spark to utilize the NVM-based design of HDFS by storing only the HBase Write Ahead Logs and Spark job outputs to NVM, respectively. We also propose enhancements to use the NVFS design as a burst buffer for running Spark jobs on top of parallel file systems like Lustre. Performance evaluations show that our design can improve the write and read throughputs of HDFS by up to 4x and 2x, respectively. The execution times of data generation benchmarks are reduced by up to 45%. The proposed design also reduces the overall execution time of the SWIM workload by up to 18% over HDFS with a maximum benefit of 37% for job-38. For Spark TeraSort, our proposed scheme yields a performance gain of up to 11%. The performances of HBase insert, update, and read operations are improved by 21%, 16%, and 26%, respectively. Our NVM-based burst buffer can improve the I/O performance of Spark PageRank by up to 24% over Lustre. To the best of our knowledge, this paper is the first attempt to incorporate NVM with RDMA for HDFS.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925426.2926290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

Abstract

Non-Volatile Memory (NVM) offers byte-addressability with DRAM like performance along with persistence. Thus, NVMs provide the opportunity to build high-throughput storage systems for data-intensive applications. HDFS (Hadoop Distributed File System) is the primary storage engine for MapReduce, Spark, and HBase. Even though HDFS was initially designed for commodity hardware, it is increasingly being used on HPC (High Performance Computing) clusters. The outstanding performance requirements of HPC systems make the I/O bottlenecks of HDFS a critical issue to rethink its storage architecture over NVMs. In this paper, we present a novel design for HDFS to leverage the byte-addressability of NVM for RDMA (Remote Direct Memory Access)-based communication. We analyze the performance potential of using NVM for HDFS and re-design HDFS I/O with memory semantics to exploit the byte-addressability fully. We call this design NVFS (NVM- and RDMA-aware HDFS). We also present cost-effective acceleration techniques for HBase and Spark to utilize the NVM-based design of HDFS by storing only the HBase Write Ahead Logs and Spark job outputs to NVM, respectively. We also propose enhancements to use the NVFS design as a burst buffer for running Spark jobs on top of parallel file systems like Lustre. Performance evaluations show that our design can improve the write and read throughputs of HDFS by up to 4x and 2x, respectively. The execution times of data generation benchmarks are reduced by up to 45%. The proposed design also reduces the overall execution time of the SWIM workload by up to 18% over HDFS with a maximum benefit of 37% for job-38. For Spark TeraSort, our proposed scheme yields a performance gain of up to 11%. The performances of HBase insert, update, and read operations are improved by 21%, 16%, and 26%, respectively. Our NVM-based burst buffer can improve the I/O performance of Spark PageRank by up to 24% over Lustre. To the best of our knowledge, this paper is the first attempt to incorporate NVM with RDMA for HDFS.

查看原文本刊更多论文

具有NVM和RDMA字节可寻址的HDFS高性能设计

非易失性内存(NVM)提供字节寻址能力，具有类似DRAM的性能和持久性。因此，nvm提供了为数据密集型应用构建高吞吐量存储系统的机会。HDFS (Hadoop Distributed File System)是MapReduce、Spark和HBase的主存储引擎。尽管HDFS最初是为商用硬件设计的，但它越来越多地用于HPC(高性能计算)集群。高性能计算系统对性能的突出要求使得HDFS的I/O瓶颈成为重新考虑其基于nvm的存储架构的关键问题。在本文中，我们提出了一种新颖的HDFS设计，利用NVM的字节可寻址性进行基于RDMA(远程直接内存访问)的通信。我们分析了在HDFS中使用NVM的性能潜力，并重新设计了具有内存语义的HDFS I/O，以充分利用字节可寻址性。我们称这种设计为NVFS(支持NVM和rdma的HDFS)。我们还为HBase和Spark提供了高性价比的加速技术，利用基于NVM的HDFS设计，分别只将HBase的Write Ahead日志和Spark的作业输出存储到NVM。我们还建议将NVFS设计用作突发缓冲区，以便在Lustre等并行文件系统之上运行Spark作业。性能评估表明，我们的设计可以将HDFS的写吞吐量和读吞吐量分别提高4倍和2倍。数据生成基准的执行时间最多减少了45%。拟议的设计还将SWIM工作负载的总体执行时间比HDFS减少了18%，job-38的最大收益为37%。对于Spark TeraSort，我们提出的方案产生高达11%的性能增益。HBase的insert、update和read操作性能分别提升了21%、16%和26%。我们基于nvm的突发缓冲可以将Spark PageRank的I/O性能比Lustre提高24%。据我们所知，本文是第一次尝试将NVM与RDMA结合到HDFS中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 International Conference on Supercomputing

自引率

0.00%

发文量