SWAT: A Programmable, In-Memory, Distributed, High-Performance Computing Platform

M. Grossman, Vivek Sarkar
{"title":"SWAT: A Programmable, In-Memory, Distributed, High-Performance Computing Platform","authors":"M. Grossman, Vivek Sarkar","doi":"10.1145/2907294.2907307","DOIUrl":null,"url":null,"abstract":"The field of data analytics is currently going through a renaissance as a result of ever-increasing dataset sizes, the value of the models that can be trained from those datasets, and a surge in flexible, distributed programming models. In particular, the Apache Hadoop and Spark programming systems, as well as their supporting projects (e.g. HDFS, SparkSQL), have greatly simplified the analysis and transformation of datasets whose size exceeds the capacity of a single machine. While these programming models facilitate the use of distributed systems to analyze large datasets, they have been plagued by performance issues. The I/O performance bottlenecks of Hadoop are partially responsible for the creation of Spark. Performance bottlenecks in Spark due to the JVM object model, garbage collection, interpreted/managed execution, and other abstraction layers are responsible for the creation of additional optimization layers, such as Project Tungsten. Indeed, the Project Tungsten issue tracker states that the \"majority of Spark workloads are not bottlenecked by I/O or network, but rather CPU and memory\". In this work, we address the CPU and memory performance bottlenecks that exist in Apache Spark by accelerating user-written computational kernels using accelerators. We refer to our approach as Spark With Accelerated Tasks (SWAT). SWAT is an accelerated data analytics (ADA) framework that enables programmers to natively execute Spark applications on high performance hardware platforms with co-processors, while continuing to write their applications in a JVM-based language like Java or Scala. Runtime code generation creates OpenCL kernels from JVM bytecode, which are then executed on OpenCL accelerators. In our work we emphasize 1) full compatibility with a modern, existing, and accepted data analytics platform, 2) an asynchronous, event-driven, and resource-aware runtime, 3) multi-GPU memory management and caching, and 4) ease-of-use and programmability. Our performance evaluation demonstrates up to 3.24x overall application speedup relative to Spark across six machine learning benchmarks, with a detailed investigation of these performance improvements.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2907294.2907307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

The field of data analytics is currently going through a renaissance as a result of ever-increasing dataset sizes, the value of the models that can be trained from those datasets, and a surge in flexible, distributed programming models. In particular, the Apache Hadoop and Spark programming systems, as well as their supporting projects (e.g. HDFS, SparkSQL), have greatly simplified the analysis and transformation of datasets whose size exceeds the capacity of a single machine. While these programming models facilitate the use of distributed systems to analyze large datasets, they have been plagued by performance issues. The I/O performance bottlenecks of Hadoop are partially responsible for the creation of Spark. Performance bottlenecks in Spark due to the JVM object model, garbage collection, interpreted/managed execution, and other abstraction layers are responsible for the creation of additional optimization layers, such as Project Tungsten. Indeed, the Project Tungsten issue tracker states that the "majority of Spark workloads are not bottlenecked by I/O or network, but rather CPU and memory". In this work, we address the CPU and memory performance bottlenecks that exist in Apache Spark by accelerating user-written computational kernels using accelerators. We refer to our approach as Spark With Accelerated Tasks (SWAT). SWAT is an accelerated data analytics (ADA) framework that enables programmers to natively execute Spark applications on high performance hardware platforms with co-processors, while continuing to write their applications in a JVM-based language like Java or Scala. Runtime code generation creates OpenCL kernels from JVM bytecode, which are then executed on OpenCL accelerators. In our work we emphasize 1) full compatibility with a modern, existing, and accepted data analytics platform, 2) an asynchronous, event-driven, and resource-aware runtime, 3) multi-GPU memory management and caching, and 4) ease-of-use and programmability. Our performance evaluation demonstrates up to 3.24x overall application speedup relative to Spark across six machine learning benchmarks, with a detailed investigation of these performance improvements.
SWAT:一个可编程的、内存中的、分布式的高性能计算平台
数据分析领域目前正在经历一场复兴,因为数据集规模不断增加,可以从这些数据集中训练的模型的价值,以及灵活的分布式编程模型的激增。特别是,Apache Hadoop和Spark编程系统,以及它们的支持项目(例如HDFS, SparkSQL),极大地简化了数据集的分析和转换,这些数据集的大小超过了单个机器的容量。虽然这些编程模型有助于使用分布式系统分析大型数据集,但它们一直受到性能问题的困扰。Hadoop的I/O性能瓶颈是创建Spark的部分原因。由于JVM对象模型、垃圾收集、解释/托管执行和其他抽象层,Spark中的性能瓶颈负责创建额外的优化层,例如Project Tungsten。事实上,Project Tungsten问题跟踪器指出,“大多数Spark工作负载的瓶颈不是I/O或网络,而是CPU和内存”。在这项工作中,我们通过使用加速器加速用户编写的计算内核来解决Apache Spark中存在的CPU和内存性能瓶颈。我们将我们的方法称为Spark With Accelerated Tasks (SWAT)。SWAT是一个加速数据分析(ADA)框架,它使程序员能够在带有协处理器的高性能硬件平台上本地执行Spark应用程序,同时继续使用基于jvm的语言(如Java或Scala)编写应用程序。运行时代码生成从JVM字节码创建OpenCL内核,然后在OpenCL加速器上执行。在我们的工作中,我们强调1)与现代的、现有的和公认的数据分析平台的完全兼容性,2)异步的、事件驱动的和资源感知的运行时,3)多gpu内存管理和缓存,以及4)易用性和可编程性。我们的性能评估显示,在六个机器学习基准测试中,相对于Spark,应用程序的整体速度提高了3.24倍,并对这些性能改进进行了详细的调查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信