International Workshop on OpenCL最新文献_第3页

Optimization of Fast Fourier Transform for Qualcomm Adreno Graphics Processing Unit 高通 Adreno 图形处理器的快速傅立叶变换优化

International Workshop on OpenCL Pub Date : 2024-04-08 DOI: 10.1145/3648115.3648119

Skyler Szot, Hongqiang Wang, Alexander Angus

引用次数: 0

Experiences with implementing Kokkos' SYCL backend 实施 Kokkos SYCL 后端的经验

International Workshop on OpenCL Pub Date : 2024-04-08 DOI: 10.1145/3648115.3648118

Daniel Arndt, Damien Lebrun-Grandié, Christian Trott

引用次数: 0

A source-to-source CUDA to SYCL code migration tool: Intel® DPC++ Compatibility Tool 一个源到源CUDA到SYCL代码迁移工具:英特尔®dpc++兼容性工具

International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529562

Zhiming Wang, Yury Plyakhin, Chenwei Sun, Ziran Zhang, Z. Jiang, Andy Huang, Hao Wang

{"title":"A source-to-source CUDA to SYCL code migration tool: Intel® DPC++ Compatibility Tool","authors":"Zhiming Wang, Yury Plyakhin, Chenwei Sun, Ziran Zhang, Z. Jiang, Andy Huang, Hao Wang","doi":"10.1145/3529538.3529562","DOIUrl":"https://doi.org/10.1145/3529538.3529562","url":null,"abstract":"oneAPI [1] is an industry initiative creating an open, standards-based, cross-architecture programming model to simplify development for a wide range of data-centric workloads across a variety of architectures including CPU, GPU, FPGA, and other accelerators. It includes a cross-architecture compiler, Data Parallel C++ (DPC++), [2] to support ISO C++ and Khronos Group's SYCL [3] and advanced libraries. Intel has created a product implementation of oneAPI with the Intel oneAPI Toolkits. These help developers efficiently build, analyze, and optimize high-performance, cross-architecture applications for CPUs, GPUs, and FPGAs. SYCL [3] is an open standard from Khronos for a portable, architecture-neutral language for expressing parallelism. The SYCL specification can be implemented by anybody for any platform. To take advantage of oneAPI and SYCL, for applications written in another language e.g., CUDA, developers seek to migrate existing code to SYCL. Once a customer migrates their code to SYCL, they are no longer tied to a single platform and can run the code on any platform that has SYCL compiler support. Intel® DPC++ Compatibility Tool is included in the Intel® oneAPI Base Toolkit, it is a tool that assists developers to do source-to-source migration, e.g., migrate code written in CUDA to SYCL code [3] to enable their code to run on multiple platforms. The tool generates human readable and maintainable code whenever possible and provides inline comments to help developers complete their code. Typically, about 90-95% of CUDA code in applications can be migrated by this tool.1 Completion of the code and verification of the final code is expected to be a manual process done by the developers. The goal of the compatibility tool is to make it as easy as possible for developers to migrate their existing CUDA codebase to SYCL to facilitate more hardware choices and access to the advantages of oneAPI and SYCL. The compatibility tool is based on LLVM/Clang [4]. It mainly contains 3 functional components: • The intercept-build tool: It is used to collect compilation options of the user input project by intercepting build process of user input project, like build option, macro definitions, include folders and so on information. During source-to-source migration, those compilation options are used to identify the active code path, header files dependencies to build a right abstract syntax tree for the user input project. • The ‘dpct’ binary tool: The tool is the main migration tool, which does source-to-source migration based on compiler front end technology. It implements a set of migration rules to migrate source language elements like types, APIs, macros to functionally compatible elements in target language. If some C/C++ code is the same between source and target language, then the tool keeps this C/C++ code unchanged. Also, the tool provides a way to let users define migration rules by themselves in migration rule description file to guide a customized","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88953169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Towards performance portability of AI models using SYCL-DNN 使用SYCL-DNN实现AI模型的性能可移植性

International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529999

Muhammad Tanvir, Kumudha Narasimhan, M. Goli, Ouadie El Farouki, S. Georgiev, Isaac Ault

{"title":"Towards performance portability of AI models using SYCL-DNN","authors":"Muhammad Tanvir, Kumudha Narasimhan, M. Goli, Ouadie El Farouki, S. Georgiev, Isaac Ault","doi":"10.1145/3529538.3529999","DOIUrl":"https://doi.org/10.1145/3529538.3529999","url":null,"abstract":"The wide adoption of Deep Neural Networks (DNN) has served as an incentive to design and manufacture powerful and specialized hardware technologies, targeting systems from Edge devices to Cloud and supercomputers. This huge diversity soon becomes a burden due to the emerging dependencies between development stacks and deployment hardware. While the proposed ONNX as a de facto for AI model description, provides the portability of AI models across various AI frameworks, supporting DNN models on various hardware architectures remains challenging. Several existing AI frameworks such as Tensorflow, Pytorch, ONNXRuntime provides performance portability via a dedicated backend implementations per hardware architecture. While such approach provides wider support of hardware devices, maintainability and readability remains challenging. There are many libraries and frameworks which were developed to support neural network models and we discuss some of the important ones in this section. Frameworks like Glow [18], nGraph [14] and Tensor Comprehensions [19] use a compiler-based approach to accept the neural network model and emit optimised code for a specific hardware. The neural network model is lowered into one or more intermediate representations before generating an optimised kernel. These frameworks, target a specific set of backends and targeting any new hardware requires implementing a considerable fraction of the operators. Other frameworks like Caffe [16], PyTorch [17] and TinyNN [10] provide runtime solution by integrating various vendor specific libraries or graph as a backend to support neural network models on different set of architectures. Framework like TensorFlow [11], rely on calling vendor-specific libraries or graph compilers. While embedding vendor-specific library can lead to achieving near metal performance, it can make adding and maintaining different backends quite tedious. Intel oneMKL [4] and oneDNN [7] are the optimized libraries for linear algebra subroutine and deep neural network routines for multi-core and manycore Intel systems. Recently, oneMKL and oneDNN have added support for running on Nvidia GPUs as well [15] via SYCL interoperability with third party libraries. This approach integrates the existing vendor optimised backend in SYCL to provide a unique SYCL-interface for memory management and runtime control from the user point of view while reusing the highly optimised vendor backend. ARM Compute Library [1], cuBLAS [6] and cuDNN [13], MIOpen [5] provides optimised routines for linear algebra and machine learning for ARM, Nvidia and AMD respectively. All these libraries are optimised for specific architectures, and very rarely provide portability. SYCL provides a C++-based portable parallel programming model to target various devices like CPUs, GPUs, DSPs, FPGAs, etc. SYCL programming model allows the developers to write highly parametrized kernels for a diverse hardware set in a unified setting. These kernels can then b","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73500410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware SYCL、OpenCL、CUDA和OpenMP在多厂商硬件上大规模并行支持向量机分类的比较

International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529980

Marcel Breyer, Alexander Van Craen, D. Pflüger

{"title":"A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware","authors":"Marcel Breyer, Alexander Van Craen, D. Pflüger","doi":"10.1145/3529538.3529980","DOIUrl":"https://doi.org/10.1145/3529538.3529980","url":null,"abstract":"In scientific computing and Artificial Intelligence (AI), which both rely on massively parallel tasks, frameworks like the Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) are widely used to harvest the computational power of accelerator cards, in particular of Graphics Processing Units (GPUs). A few years ago, GPUs from NVIDIA were used almost exclusively for these tasks but meanwhile, AMD and Intel are increasing their shares of the GPUs market. This introduces many new challenges for code development, as the prevailing CUDA code can only run on NVIDIA hardware and must be adapted or even completely rewritten to run on GPUs from AMD or Intel. In this paper, we compare the different competing programming frameworks OpenMP, CUDA, OpenCL, and SYCL, paying special attention to the two SYCL implementations hipSYCL and DPC++. Thereby, we investigate the different frameworks with respect to their usability, performance, and performance portability on a variety of hardware platforms from different vendors, i.e., GPUs from NVIDIA, AMD, and Intel and Central Processing Units (CPUs) from AMD and Intel. Besides discussing the runtimes of these frameworks on the different hardware platforms, we also focus our comparison on the differences between the nd_range kernel formulation and the SYCL specific hierarchical kernels. Our Parallel Least Squares Support Vector Machine (PLSSVM) library implements backends for the four previously mentioned programming frameworks for a Least Squares Support Vector Machines (LS-SVMs). At its example, we show which of the frameworks is best suited for a standard workload that is frequently employed in scientific computing and AI, depending on the target hardware: The most computationally intensive part of our PLSSVM library is solving a system of linear equations using the Conjugate Gradient (CG) method. Specifically, we parallelize the implicit matrix-vector multiplication inside the CG method, a workload common in many scientific codes. The PLSSVM code, utility scripts, and documentation are all available on GitHub: https://github.com/SC-SGS/PLSSVM.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"90 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72868617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Reaching even richer C++ in OpenCL kernels with use of libclcxx 使用libclcxx在OpenCL内核中实现更丰富的c++

International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529540

Anastasia Stulova, Ishfaq Wardag

引用次数: 0

Celerity: How (Well) Does the SYCL API Translate to Distributed Clusters? 敏捷:SYCL API如何(很好地)转换到分布式集群?

International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3530004

Philip Salzmann, Fabian Knorr, Peter Thoman, Biagio Cosenza

引用次数: 0

Interfacing SYCL and Python for XPU Programming XPU编程中SYCL和Python的接口

International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529990

O. Pavlyk, Diptorup Deb

引用次数: 0

Compiler-aided nd-range parallel-for implementations on CPU in hipSYCL 在hipSYCL中，编译器辅助的和范围并行的CPU实现

International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3530216

Joachim Meyer, Aksel Alpay, H. Fröning, V. Heuveline

引用次数: 4

An Overview of OpenCL Vendor Extensions Supported in Qualcomm Adreno GPUs 高通Adreno gpu支持的OpenCL供应商扩展概述

International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3530002

Hongqiang Wang, Balaji Calidas

引用次数: 0