International Workshop on OpenCL最新文献

筛选
英文 中文
Optimization of Fast Fourier Transform for Qualcomm Adreno Graphics Processing Unit 高通 Adreno 图形处理器的快速傅立叶变换优化
International Workshop on OpenCL Pub Date : 2024-04-08 DOI: 10.1145/3648115.3648119
Skyler Szot, Hongqiang Wang, Alexander Angus
{"title":"Optimization of Fast Fourier Transform for Qualcomm Adreno Graphics Processing Unit","authors":"Skyler Szot, Hongqiang Wang, Alexander Angus","doi":"10.1145/3648115.3648119","DOIUrl":"https://doi.org/10.1145/3648115.3648119","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140730357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experiences with implementing Kokkos' SYCL backend 实施 Kokkos SYCL 后端的经验
International Workshop on OpenCL Pub Date : 2024-04-08 DOI: 10.1145/3648115.3648118
Daniel Arndt, Damien Lebrun-Grandié, Christian Trott
{"title":"Experiences with implementing Kokkos' SYCL backend","authors":"Daniel Arndt, Damien Lebrun-Grandié, Christian Trott","doi":"10.1145/3648115.3648118","DOIUrl":"https://doi.org/10.1145/3648115.3648118","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140729774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A source-to-source CUDA to SYCL code migration tool: Intel® DPC++ Compatibility Tool 一个源到源CUDA到SYCL代码迁移工具:英特尔®dpc++兼容性工具
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529562
Zhiming Wang, Yury Plyakhin, Chenwei Sun, Ziran Zhang, Z. Jiang, Andy Huang, Hao Wang
{"title":"A source-to-source CUDA to SYCL code migration tool: Intel® DPC++ Compatibility Tool","authors":"Zhiming Wang, Yury Plyakhin, Chenwei Sun, Ziran Zhang, Z. Jiang, Andy Huang, Hao Wang","doi":"10.1145/3529538.3529562","DOIUrl":"https://doi.org/10.1145/3529538.3529562","url":null,"abstract":"oneAPI [1] is an industry initiative creating an open, standards-based, cross-architecture programming model to simplify development for a wide range of data-centric workloads across a variety of architectures including CPU, GPU, FPGA, and other accelerators. It includes a cross-architecture compiler, Data Parallel C++ (DPC++), [2] to support ISO C++ and Khronos Group's SYCL [3] and advanced libraries. Intel has created a product implementation of oneAPI with the Intel oneAPI Toolkits. These help developers efficiently build, analyze, and optimize high-performance, cross-architecture applications for CPUs, GPUs, and FPGAs. SYCL [3] is an open standard from Khronos for a portable, architecture-neutral language for expressing parallelism. The SYCL specification can be implemented by anybody for any platform. To take advantage of oneAPI and SYCL, for applications written in another language e.g., CUDA, developers seek to migrate existing code to SYCL. Once a customer migrates their code to SYCL, they are no longer tied to a single platform and can run the code on any platform that has SYCL compiler support. Intel® DPC++ Compatibility Tool is included in the Intel® oneAPI Base Toolkit, it is a tool that assists developers to do source-to-source migration, e.g., migrate code written in CUDA to SYCL code [3] to enable their code to run on multiple platforms. The tool generates human readable and maintainable code whenever possible and provides inline comments to help developers complete their code. Typically, about 90-95% of CUDA code in applications can be migrated by this tool.1 Completion of the code and verification of the final code is expected to be a manual process done by the developers. The goal of the compatibility tool is to make it as easy as possible for developers to migrate their existing CUDA codebase to SYCL to facilitate more hardware choices and access to the advantages of oneAPI and SYCL. The compatibility tool is based on LLVM/Clang [4]. It mainly contains 3 functional components: • The intercept-build tool: It is used to collect compilation options of the user input project by intercepting build process of user input project, like build option, macro definitions, include folders and so on information. During source-to-source migration, those compilation options are used to identify the active code path, header files dependencies to build a right abstract syntax tree for the user input project. • The ‘dpct’ binary tool: The tool is the main migration tool, which does source-to-source migration based on compiler front end technology. It implements a set of migration rules to migrate source language elements like types, APIs, macros to functionally compatible elements in target language. If some C/C++ code is the same between source and target language, then the tool keeps this C/C++ code unchanged. Also, the tool provides a way to let users define migration rules by themselves in migration rule description file to guide a customized","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88953169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards performance portability of AI models using SYCL-DNN 使用SYCL-DNN实现AI模型的性能可移植性
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529999
Muhammad Tanvir, Kumudha Narasimhan, M. Goli, Ouadie El Farouki, S. Georgiev, Isaac Ault
{"title":"Towards performance portability of AI models using SYCL-DNN","authors":"Muhammad Tanvir, Kumudha Narasimhan, M. Goli, Ouadie El Farouki, S. Georgiev, Isaac Ault","doi":"10.1145/3529538.3529999","DOIUrl":"https://doi.org/10.1145/3529538.3529999","url":null,"abstract":"The wide adoption of Deep Neural Networks (DNN) has served as an incentive to design and manufacture powerful and specialized hardware technologies, targeting systems from Edge devices to Cloud and supercomputers. This huge diversity soon becomes a burden due to the emerging dependencies between development stacks and deployment hardware. While the proposed ONNX as a de facto for AI model description, provides the portability of AI models across various AI frameworks, supporting DNN models on various hardware architectures remains challenging. Several existing AI frameworks such as Tensorflow, Pytorch, ONNXRuntime provides performance portability via a dedicated backend implementations per hardware architecture. While such approach provides wider support of hardware devices, maintainability and readability remains challenging. There are many libraries and frameworks which were developed to support neural network models and we discuss some of the important ones in this section. Frameworks like Glow [18], nGraph [14] and Tensor Comprehensions [19] use a compiler-based approach to accept the neural network model and emit optimised code for a specific hardware. The neural network model is lowered into one or more intermediate representations before generating an optimised kernel. These frameworks, target a specific set of backends and targeting any new hardware requires implementing a considerable fraction of the operators. Other frameworks like Caffe [16], PyTorch [17] and TinyNN [10] provide runtime solution by integrating various vendor specific libraries or graph as a backend to support neural network models on different set of architectures. Framework like TensorFlow [11], rely on calling vendor-specific libraries or graph compilers. While embedding vendor-specific library can lead to achieving near metal performance, it can make adding and maintaining different backends quite tedious. Intel oneMKL [4] and oneDNN [7] are the optimized libraries for linear algebra subroutine and deep neural network routines for multi-core and manycore Intel systems. Recently, oneMKL and oneDNN have added support for running on Nvidia GPUs as well [15] via SYCL interoperability with third party libraries. This approach integrates the existing vendor optimised backend in SYCL to provide a unique SYCL-interface for memory management and runtime control from the user point of view while reusing the highly optimised vendor backend. ARM Compute Library [1], cuBLAS [6] and cuDNN [13], MIOpen [5] provides optimised routines for linear algebra and machine learning for ARM, Nvidia and AMD respectively. All these libraries are optimised for specific architectures, and very rarely provide portability. SYCL provides a C++-based portable parallel programming model to target various devices like CPUs, GPUs, DSPs, FPGAs, etc. SYCL programming model allows the developers to write highly parametrized kernels for a diverse hardware set in a unified setting. These kernels can then b","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73500410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware SYCL、OpenCL、CUDA和OpenMP在多厂商硬件上大规模并行支持向量机分类的比较
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529980
Marcel Breyer, Alexander Van Craen, D. Pflüger
{"title":"A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware","authors":"Marcel Breyer, Alexander Van Craen, D. Pflüger","doi":"10.1145/3529538.3529980","DOIUrl":"https://doi.org/10.1145/3529538.3529980","url":null,"abstract":"In scientific computing and Artificial Intelligence (AI), which both rely on massively parallel tasks, frameworks like the Compute Unified Device Architecture (CUDA) and the Open Computing Language (OpenCL) are widely used to harvest the computational power of accelerator cards, in particular of Graphics Processing Units (GPUs). A few years ago, GPUs from NVIDIA were used almost exclusively for these tasks but meanwhile, AMD and Intel are increasing their shares of the GPUs market. This introduces many new challenges for code development, as the prevailing CUDA code can only run on NVIDIA hardware and must be adapted or even completely rewritten to run on GPUs from AMD or Intel. In this paper, we compare the different competing programming frameworks OpenMP, CUDA, OpenCL, and SYCL, paying special attention to the two SYCL implementations hipSYCL and DPC++. Thereby, we investigate the different frameworks with respect to their usability, performance, and performance portability on a variety of hardware platforms from different vendors, i.e., GPUs from NVIDIA, AMD, and Intel and Central Processing Units (CPUs) from AMD and Intel. Besides discussing the runtimes of these frameworks on the different hardware platforms, we also focus our comparison on the differences between the nd_range kernel formulation and the SYCL specific hierarchical kernels. Our Parallel Least Squares Support Vector Machine (PLSSVM) library implements backends for the four previously mentioned programming frameworks for a Least Squares Support Vector Machines (LS-SVMs). At its example, we show which of the frameworks is best suited for a standard workload that is frequently employed in scientific computing and AI, depending on the target hardware: The most computationally intensive part of our PLSSVM library is solving a system of linear equations using the Conjugate Gradient (CG) method. Specifically, we parallelize the implicit matrix-vector multiplication inside the CG method, a workload common in many scientific codes. The PLSSVM code, utility scripts, and documentation are all available on GitHub: https://github.com/SC-SGS/PLSSVM.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72868617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Reaching even richer C++ in OpenCL kernels with use of libclcxx 使用libclcxx在OpenCL内核中实现更丰富的c++
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529540
Anastasia Stulova, Ishfaq Wardag
{"title":"Reaching even richer C++ in OpenCL kernels with use of libclcxx","authors":"Anastasia Stulova, Ishfaq Wardag","doi":"10.1145/3529538.3529540","DOIUrl":"https://doi.org/10.1145/3529538.3529540","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89933664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Celerity: How (Well) Does the SYCL API Translate to Distributed Clusters? 敏捷:SYCL API如何(很好地)转换到分布式集群?
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3530004
Philip Salzmann, Fabian Knorr, Peter Thoman, Biagio Cosenza
{"title":"Celerity: How (Well) Does the SYCL API Translate to Distributed Clusters?","authors":"Philip Salzmann, Fabian Knorr, Peter Thoman, Biagio Cosenza","doi":"10.1145/3529538.3530004","DOIUrl":"https://doi.org/10.1145/3529538.3530004","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89389656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interfacing SYCL and Python for XPU Programming XPU编程中SYCL和Python的接口
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529990
O. Pavlyk, Diptorup Deb
{"title":"Interfacing SYCL and Python for XPU Programming","authors":"O. Pavlyk, Diptorup Deb","doi":"10.1145/3529538.3529990","DOIUrl":"https://doi.org/10.1145/3529538.3529990","url":null,"abstract":"This paper introduces a new framework to help build and use SYCL-based Python native extensions. We present the core design and implementation detail of the framework that includes an overview of the API, a technique to support asynchronous SYCL kernel execution via Python, and discussion around the usage of Python extension generator tools to build SYCL-based extensions. Details of ongoing work are presented and we demonstrate the development of a performance portable Python native extension that relies on the SYCL-based oneMKL specification.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88027131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compiler-aided nd-range parallel-for implementations on CPU in hipSYCL 在hipSYCL中,编译器辅助的和范围并行的CPU实现
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3530216
Joachim Meyer, Aksel Alpay, H. Fröning, V. Heuveline
{"title":"Compiler-aided nd-range parallel-for implementations on CPU in hipSYCL","authors":"Joachim Meyer, Aksel Alpay, H. Fröning, V. Heuveline","doi":"10.1145/3529538.3530216","DOIUrl":"https://doi.org/10.1145/3529538.3530216","url":null,"abstract":"With heterogeneous programming continuously on the rise, performance portability is still to be improved. SYCL provides the nd-range parallel-for paradigm for writing data-parallel kernels. This model allows barriers for group-local synchronization, similar to CUDA and OpenCL kernels. GPUs provide efficient means to model this, but on CPUs the necessary forward-progress guarantees require the use of many (lightweight) threads in library-only SYCL implementations, rendering the nd-range parallel-for unacceptably inefficient. By adopting two compiler-based approaches solving this, the present work improves the performance of the nd-range parallel-for in hipSYCL for CPUs by up to multiple orders of magnitude on various CPU architectures. The two alternatives are compared with regard to their functional correctness and performance. By upstreaming one of the variants, hipSYCL is the first SYCL implementation to provide a well performing nd-range parallel-for on CPU, without requiring an available OpenCL runtime.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74202660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Overview of OpenCL Vendor Extensions Supported in Qualcomm Adreno GPUs 高通Adreno gpu支持的OpenCL供应商扩展概述
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3530002
Hongqiang Wang, Balaji Calidas
{"title":"An Overview of OpenCL Vendor Extensions Supported in Qualcomm Adreno GPUs","authors":"Hongqiang Wang, Balaji Calidas","doi":"10.1145/3529538.3530002","DOIUrl":"https://doi.org/10.1145/3529538.3530002","url":null,"abstract":"One of the key advantages of using OpenCL is its openness and flexibility, as it allows OpenCL vendors to extend the standard OpenCL features or add new features through the extension mechanism. OpenCL allows three types of extensions, the KHR extensions, the external extensions, and the vendor extensions. Vendor extensions are less restrictive than the KHR and the external extensions, which normally require multiple vendors to adopt or conformance tests to pass. This poster focuses on the vendor extensions solely available on the Adreno mobile GPUs in Qualcomm’s Snapdragon SOCs (system-on-chip). Adreno GPUs support a wide range of vendor extensions. This poster will provide a high-level overview of the extensions. More detailed descriptions and examples can be found in [1]. Note that Adreno GPUs have many tiers and generations featuring different capabilities. Generally, developers must query its availability on the device before using the extension via API calls such as clGetDeviceInfo , to avoid possible incompatibility or portability issues in future.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85231853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信