Scaling Spark on HPC Systems

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2016-05-31 DOI:10.1145/2907294.2907310

Nicholas Chaimov, A. Malony, S. Canon, Costin Iancu, K. Ibrahim, Jayanth Srinivasan

{"title":"Scaling Spark on HPC Systems","authors":"Nicholas Chaimov, A. Malony, S. Canon, Costin Iancu, K. Ibrahim, Jayanth Srinivasan","doi":"10.1145/2907294.2907310","DOIUrl":null,"url":null,"abstract":"We report our experiences porting Spark to large production HPC systems. While Spark performance in a data center installation (with local disks) is dominated by the network, our results show that file system metadata access latency can dominate in a HPC installation using Lustre: it determines single node performance up to 4x slower than a typical workstation. We evaluate a combination of software techniques and hardware configurations designed to address this problem. For example, on the software side we develop a file pooling layer able to improve per node performance up to 2.8x. On the hardware side we evaluate a system with a large NVRAM buffer between compute nodes and the backend Lustre file system: this improves scaling at the expense of per-node performance. Overall, our results indicate that scalability is currently limited to O(102) cores in a HPC installation with Lustre and default Spark. After careful configuration combined with our pooling we can scale up to O(10^4). As our analysis indicates, it is feasible to observe much higher scalability in the near future.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2907294.2907310","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 79

Abstract

We report our experiences porting Spark to large production HPC systems. While Spark performance in a data center installation (with local disks) is dominated by the network, our results show that file system metadata access latency can dominate in a HPC installation using Lustre: it determines single node performance up to 4x slower than a typical workstation. We evaluate a combination of software techniques and hardware configurations designed to address this problem. For example, on the software side we develop a file pooling layer able to improve per node performance up to 2.8x. On the hardware side we evaluate a system with a large NVRAM buffer between compute nodes and the backend Lustre file system: this improves scaling at the expense of per-node performance. Overall, our results indicate that scalability is currently limited to O(102) cores in a HPC installation with Lustre and default Spark. After careful configuration combined with our pooling we can scale up to O(10^4). As our analysis indicates, it is feasible to observe much higher scalability in the near future.

查看原文本刊更多论文

在HPC系统上扩展Spark

我们报告了将Spark移植到大型生产HPC系统的经验。虽然Spark在数据中心安装(使用本地磁盘)中的性能主要由网络决定，但我们的结果表明，在使用Lustre的HPC安装中，文件系统元数据访问延迟可能占主导地位:它决定单节点性能的速度比典型工作站慢4倍。我们评估了旨在解决此问题的软件技术和硬件配置的组合。例如，在软件方面，我们开发了一个文件池层，可以将每个节点的性能提高2.8倍。在硬件方面，我们评估了一个在计算节点和后端Lustre文件系统之间有一个大NVRAM缓冲区的系统:这以牺牲每个节点的性能为代价提高了可伸缩性。总的来说，我们的结果表明，在使用Lustre和默认Spark的HPC安装中，可扩展性目前仅限于O(102)个内核。经过仔细的配置和池化，我们可以扩展到0(10^4)。正如我们的分析所表明的那样，在不久的将来观察到更高的可伸缩性是可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing

自引率

0.00%

发文量