Optimizing Performance and Computing Resource Management of In-memory Big Data Analytics with Disaggregated Persistent Memory

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-14 DOI:10.1109/CCGRID.2019.00012

Shouwei Chen, Wensheng Wang, Xueyang Wu, Zhen Fan, Kunwu Huang, Peiyu Zhuang, Yue Li, I. Rodero, M. Parashar, Dennis Weng

{"title":"Optimizing Performance and Computing Resource Management of In-memory Big Data Analytics with Disaggregated Persistent Memory","authors":"Shouwei Chen, Wensheng Wang, Xueyang Wu, Zhen Fan, Kunwu Huang, Peiyu Zhuang, Yue Li, I. Rodero, M. Parashar, Dennis Weng","doi":"10.1109/CCGRID.2019.00012","DOIUrl":null,"url":null,"abstract":"The performance of modern Big Data frameworks, e.g. Spark, depends greatly on high-speed storage and shuffling, which impose a significant memory burden on production data centers. In many production situations, the persistence and shuffling intensive applications can suffer a major performance loss due to lack of memory. Thus, the common practice is usually to over-allocate the memory assigned to the data workers for production applications, which in turn reduces overall resource utilization. One efficient way to address the dilemma between the performance and cost efficiency of Big Data applications is through data center computing resource disaggregation. This paper proposes and implements a system that incorporates the Spark Big Data framework with a novel in-memory distributed file system to achieve memory disaggregation for data persistence and shuffling. We address the challenge of optimizing performance at affordable cost by co-designing the proposed in-memory distributed file system with large-volume DIMM-based persistent memory (PMEM) and RDMA technology. The disaggregation design allows each part of the system to be scaled independently, which is particularly suitable for cloud deployments. The proposed system is evaluated in a production-level cluster using real enterprise-level Spark production applications. The results of an empirical evaluation show that the system can achieve up to a 3.5- fold performance improvement for shuffle-intensive applications with the same amount of memory, compared to the default Spark setup. Moreover, by leveraging PMEM, we demonstrate that our system can effectively increase the memory capacity of the computing cluster with affordable cost, with a reasonable execution time overhead with respect to using local DRAM only.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The performance of modern Big Data frameworks, e.g. Spark, depends greatly on high-speed storage and shuffling, which impose a significant memory burden on production data centers. In many production situations, the persistence and shuffling intensive applications can suffer a major performance loss due to lack of memory. Thus, the common practice is usually to over-allocate the memory assigned to the data workers for production applications, which in turn reduces overall resource utilization. One efficient way to address the dilemma between the performance and cost efficiency of Big Data applications is through data center computing resource disaggregation. This paper proposes and implements a system that incorporates the Spark Big Data framework with a novel in-memory distributed file system to achieve memory disaggregation for data persistence and shuffling. We address the challenge of optimizing performance at affordable cost by co-designing the proposed in-memory distributed file system with large-volume DIMM-based persistent memory (PMEM) and RDMA technology. The disaggregation design allows each part of the system to be scaled independently, which is particularly suitable for cloud deployments. The proposed system is evaluated in a production-level cluster using real enterprise-level Spark production applications. The results of an empirical evaluation show that the system can achieve up to a 3.5- fold performance improvement for shuffle-intensive applications with the same amount of memory, compared to the default Spark setup. Moreover, by leveraging PMEM, we demonstrate that our system can effectively increase the memory capacity of the computing cluster with affordable cost, with a reasonable execution time overhead with respect to using local DRAM only.

查看原文本刊更多论文

基于分解持久内存的内存大数据分析性能优化与计算资源管理

现代大数据框架(如Spark)的性能在很大程度上依赖于高速存储和变换，这给生产数据中心带来了巨大的内存负担。在许多生产环境中，由于缺乏内存，持久性和变换密集型应用程序可能会遭受严重的性能损失。因此，通常的做法是过度分配分配给生产应用程序的数据工作者的内存，这反过来又降低了总体资源利用率。解决大数据应用在性能和成本效率之间的两难困境的一种有效方法是通过数据中心计算资源分解。本文提出并实现了一个将Spark大数据框架与一种新颖的内存分布式文件系统相结合的系统，以实现内存分解，实现数据的持久化和洗刷。我们通过与基于大容量内存的持久内存(PMEM)和RDMA技术共同设计所提出的内存分布式文件系统，解决了以可承受的成本优化性能的挑战。分解设计允许系统的每个部分独立扩展，这特别适合云部署。采用真实的企业级Spark生产应用，在生产级集群中对所提出的系统进行了评估。经验评估的结果表明，与默认的Spark设置相比，该系统可以在具有相同内存量的洗牌密集型应用程序中实现高达3.5倍的性能提升。此外，通过利用PMEM，我们证明了我们的系统可以有效地增加计算集群的内存容量，并且成本可承受，相对于仅使用本地DRAM而言，执行时间开销合理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量