Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine Library

Proceedings of the 2023 International Workshop on OpenCL Pub Date : 2023-04-18 DOI:10.1145/3585341.3585369

Marcel Breyer, Alexander Van Craen, D. Pflüger

{"title":"Performance Evolution of Different SYCL Implementations based on the Parallel Least Squares Support Vector Machine Library","authors":"Marcel Breyer, Alexander Van Craen, D. Pflüger","doi":"10.1145/3585341.3585369","DOIUrl":null,"url":null,"abstract":"In machine learning and scientific computing, some of the biggest challenges are efficient and performant portable computing. With our Parallel Least Squares Support Vector Machine (PLSSVM) library, we have not only developed an unrivaled Support Vector Machine (SVM) implementation for huge dense data sets, but we have also created a representative benchmark for a frequently encountered task in scientific computing, a (implicit) matrix-vector multiplication. PLSSVM supports multiple backends—OpenMP, CUDA, HIP, OpenCL, and SYCL—to be able to target the most widely used hardware platforms in machine learning and scientific computing. In this paper, we use PLSSVM to compare different DPC++ and Open SYCL (formerly known as hipSYCL) versions over the period of one year. Furthermore, we compared two versions (one from February and the other from November 2022) with each other and report their respective performance evolution in depth. We also put these results in relation to our other implemented backends and report their performance portability on three different hardware platforms, an NVIDIA and AMD GPU and an Intel CPU. Our results show that installing new DPC++ and Open SYCL versions can have surprisingly vast impacts in both directions. In our case, the nd_range kernel runtimes were up to faster on an NVIDIA GPU when using a newer DPC++ compiler. Also for Open SYCL, using the new omp.accelerated compilation flow improves the nd_range performance on CPUs by over . When compared to OpenCL, in our results, SYCL also offers a better performance portability while being easier to use, indicated by drastically fewer lines of code needed in our PLSSVM library. While OpenCL only has a performance portability of , DPC++ achieved the highest value with within the performance metric provided by Pennycook et al. [23]. The code, utility scripts, and documentation are all publicly available on GitHub: https://github.com/SC-SGS/PLSSVM.","PeriodicalId":360830,"journal":{"name":"Proceedings of the 2023 International Workshop on OpenCL","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3585341.3585369","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In machine learning and scientific computing, some of the biggest challenges are efficient and performant portable computing. With our Parallel Least Squares Support Vector Machine (PLSSVM) library, we have not only developed an unrivaled Support Vector Machine (SVM) implementation for huge dense data sets, but we have also created a representative benchmark for a frequently encountered task in scientific computing, a (implicit) matrix-vector multiplication. PLSSVM supports multiple backends—OpenMP, CUDA, HIP, OpenCL, and SYCL—to be able to target the most widely used hardware platforms in machine learning and scientific computing. In this paper, we use PLSSVM to compare different DPC++ and Open SYCL (formerly known as hipSYCL) versions over the period of one year. Furthermore, we compared two versions (one from February and the other from November 2022) with each other and report their respective performance evolution in depth. We also put these results in relation to our other implemented backends and report their performance portability on three different hardware platforms, an NVIDIA and AMD GPU and an Intel CPU. Our results show that installing new DPC++ and Open SYCL versions can have surprisingly vast impacts in both directions. In our case, the nd_range kernel runtimes were up to faster on an NVIDIA GPU when using a newer DPC++ compiler. Also for Open SYCL, using the new omp.accelerated compilation flow improves the nd_range performance on CPUs by over . When compared to OpenCL, in our results, SYCL also offers a better performance portability while being easier to use, indicated by drastically fewer lines of code needed in our PLSSVM library. While OpenCL only has a performance portability of , DPC++ achieved the highest value with within the performance metric provided by Pennycook et al. [23]. The code, utility scripts, and documentation are all publicly available on GitHub: https://github.com/SC-SGS/PLSSVM.

查看原文本刊更多论文

基于并行最小二乘支持向量机库的不同SYCL实现的性能演化

在机器学习和科学计算中，一些最大的挑战是高效和高性能的便携式计算。通过我们的并行最小二乘支持向量机(PLSSVM)库，我们不仅为巨大的密集数据集开发了无与伦比的支持向量机(SVM)实现，而且我们还为科学计算中经常遇到的任务(隐式)矩阵向量乘法创建了一个代表性基准。PLSSVM支持多个后端——openmp、CUDA、HIP、OpenCL和sycl——能够针对机器学习和科学计算中最广泛使用的硬件平台。在本文中，我们使用PLSSVM在一年的时间内比较不同的dpc++和Open SYCL(以前称为hipSYCL)版本。此外，我们还比较了两个版本(一个来自2022年2月，另一个来自2022年11月)，并深入报告了它们各自的性能演变。我们还将这些结果与其他实现的后端进行比较，并报告它们在三种不同硬件平台(NVIDIA和AMD GPU以及Intel CPU)上的性能可移植性。我们的结果表明，安装新的dpc++和Open SYCL版本可以在两个方向上产生惊人的巨大影响。在我们的例子中，当使用较新的dpc++编译器时，nd_range内核运行时在NVIDIA GPU上运行得更快。同样适用于Open SYCL，使用新的omp。加速编译流将cpu上的nd_range性能提高了一半。与OpenCL相比，在我们的结果中，SYCL还提供了更好的性能可移植性，同时更容易使用，这表明我们的PLSSVM库中所需的代码行数大大减少。OpenCL的性能可移植性仅为，而dpc++在Pennycook等人[23]提供的性能指标中达到了最高值。代码、实用程序脚本和文档都可以在GitHub上公开获得:https://github.com/SC-SGS/PLSSVM。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2023 International Workshop on OpenCL

自引率

0.00%

发文量