{"title":"Deploying Network Key-Value SSDs to Disaggregate Resources in Big Data Processing Frameworks","authors":"Mahsa Bayati, Harsh Roogi, Ron Lee, N. Mi","doi":"10.1109/IPCCC50635.2020.9391532","DOIUrl":null,"url":null,"abstract":"The exponential data generation embraces unstructured object storage systems as an effective solution to improve performance. Key-Value (KV) SSD object storage devices are unveiled to mitigate the shortcomings of traditional Key-Value stores on block devices, including device low-bandwidth utilization and KV-store resource-draining operations on the host CPU and block devices. Samsung KV-SSDs are built on top of NVMe over Fabric hardware, which supports storage remote access protocols (i.e., RDMA). Network Key-Value (NKV) is a software eco-system developed by Samsung that enables data distribution and storage disaggregation of KV-SSDs. Most widely used big data processing platforms, such as Hadoop, Presto, deploy Hadoop Distributed File System (HDFS) to take advantage of rapid data access by co-locating storage and compute nodes. The co-allocation of compute and storage node limits the scalability and utilization resources and thus increases the total cost of ownership. In this paper, we present a new storage disaggregation model for big data processing platforms. Our new system layout leverages resource disaggregation by separating compute infrastructure from storage infrastructure and utilizes the benefits of new evolving storage technology, i.e., KV-SSD, for large-scale data access and processing. The goal of this work is to facilitate independent scaling of storage and compute resources, and shift the data retrieval load from the hosts to storage nodes. We evaluate our designed architecture using TPC-DS benchmark. Our results show that the CPU load on compute nodes is non-negligibly released with sustaining the same performance compared to the conventional Hadoop with HDFS.","PeriodicalId":226034,"journal":{"name":"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPCCC50635.2020.9391532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The exponential data generation embraces unstructured object storage systems as an effective solution to improve performance. Key-Value (KV) SSD object storage devices are unveiled to mitigate the shortcomings of traditional Key-Value stores on block devices, including device low-bandwidth utilization and KV-store resource-draining operations on the host CPU and block devices. Samsung KV-SSDs are built on top of NVMe over Fabric hardware, which supports storage remote access protocols (i.e., RDMA). Network Key-Value (NKV) is a software eco-system developed by Samsung that enables data distribution and storage disaggregation of KV-SSDs. Most widely used big data processing platforms, such as Hadoop, Presto, deploy Hadoop Distributed File System (HDFS) to take advantage of rapid data access by co-locating storage and compute nodes. The co-allocation of compute and storage node limits the scalability and utilization resources and thus increases the total cost of ownership. In this paper, we present a new storage disaggregation model for big data processing platforms. Our new system layout leverages resource disaggregation by separating compute infrastructure from storage infrastructure and utilizes the benefits of new evolving storage technology, i.e., KV-SSD, for large-scale data access and processing. The goal of this work is to facilitate independent scaling of storage and compute resources, and shift the data retrieval load from the hosts to storage nodes. We evaluate our designed architecture using TPC-DS benchmark. Our results show that the CPU load on compute nodes is non-negligibly released with sustaining the same performance compared to the conventional Hadoop with HDFS.