SYCLomatic compatibility library: making migration to SYCL easier

Andy Huang
{"title":"SYCLomatic compatibility library: making migration to SYCL easier","authors":"Andy Huang","doi":"10.1145/3585341.3585349","DOIUrl":null,"url":null,"abstract":"SYCL[1] is a royalty-free, cross-platform abstraction C++ programming model for heterogeneous computing. SYCL provides necessary programming interfaces like device, queue, kernel, memory interface including buffer, accessor as well as features like USM. As a programing model for heterogeneous computing, Intel oneAPI[2] provides a SYCL compiler and runtime to support SYCL kernel-based programing and set of optimized libraries to support API-based programming. SYCLomatic[3] is a project to assist developers in migrating their existing code written in different programming languages to the SYCL C++ heterogeneous programming model. SYCLomatic supports source-to-source migration from existing CUDA application source code to SYCL source code by leveraging SYCL interfaces and the optimized libraries provided by Intel oneAPI. One of the major challenges of SYCLomatic is that, in some cases, due to differences in API, expressing the identical semantic of a single line of CUDA code in SYCL requires additional data structures or multiple lines of operations. To assist the migration and make the migrated code performant and maintainable, SYCLomatic implements a compatibility library, which consists of additions to SYCL interfaces and a set of compatible APIs for popular libraries. Without the dependency to SYCLomatic, the compatibility library can be used as a standalone library for SYCL programming. In this talk, we are going to share the reason of creating the compatibility library and the design of the compatibility library. Addressing Semantic Differences: The first part of the compatibility library is to address the semantic differences with CUDA code by adding new functionality to SYCL interfaces like device, queue, malloc, image accessor, etc. by introducing new classes. (1) Utility features to access queues in different devices and threads: Keeping and passing around the sycl::device pointer between host functions is tedious. In the compatibility library, a singleton device manager class is introduced and used to track the usage of each device in different CPU threads. With the device manager class, it is easy to achieve following features: (a) Get the “current” device in a thread: The class keeps a map between threads and the last used device in the thread. The map makes it easier to access the wanted device in a host function. (b) Get the default queue for a device: When offloading a task to a device, SYCL requires developer to create a new queue on the device if the pointer of previous created queue is not available. The class keeps a default queue for each device which will be available globally. When a developer needs to use the queue on a device, the class provides a convenient interface to get the default queue of the device. (c) Device level operation (create queue, synchronize, reset): The class records all the creation of queues and maps the queues to the devices. Therefore, device level synchronization can be achieved easily. (2) Pointer-like memory operation for non-USM mode: Since managing memory through pointer operations may be more convenient for some cases, emulating pointer operations with sycl::buffer provides pointer-like memory operations including malloc, free, arithmetic, etc. for the devices which do not support USM. (3) Flexible interface to fetch Image data: The compatibility library introduces a class which simplifies the operation of fetching image data, e.g., extracting 1 or 2 channels from the image accessor. Compatible APIs: The second part of the compatibility library is to provide syntactic sugar for frequently used API calls. (1) Free functions for atomic operation: With sycl::atomic_ref, performing an atomic operation requires following 2 steps: (a) Construction of sycl::atomic_ref (b) Executing the atomic operation on the sycl::atomic_ref The compatibility library introduces a set of templated atomic calls to help developers simplify their code. (2) Utility Classes to simplify device memory allocation: Since sycl::malloc cannot be used to allocate a multi-dimension array and requires multiple steps to create a device-accessible static or global variable, a device memory class performs memory allocation and keeps the dimension information, also providing the following features: (a) Simple interface to allocate a multi-dimension array and pass it to device (b) Simple interface to create a static or global variable which can be accessed in device (3) 2D and 3D Memory Operations (USM, non-USM): The compatibility library provides free functions for 2D and 3D memory operations like allocation, memory copy, memory set, etc. to save efforts for developers. (4) Compatible APIs for popular CUDA libraries: Libraries like BLAS (Basic Linear Algebra Subprograms), CCL (Collective Communication Library), DNN (Deep Neural Network Library), STL algorithm, FFT (Fast Fourier transform), etc. are widely used in heterogeneous applications. While Intel oneAPI package provides the libraries with SYCL interfaces, there is some difference in the API design for libraries from different implementation which provide similar core functionality. The compatibility library contains APIs to bridge the usage difference and let developers implement SYCL applications with the interface they are more familiar with. Since SYCL is a relatively young language specification, many existing heterogeneous computing applications, libraries, and frameworks may not have a SYCL interface supported. With the compatibility library addressing some of the syntax/semantic differences between SYCL and other heterogeneous computing languages, developers should be able to create SYCL-based libraries/framework with less effort. To improve the functionality and useability of the compatibility library, there is still work to do, like making the compatibility library to co-exist with SYCL-implemented components in the aspect of device selection, queue activation, task synchronization, etc. and addressing interface differences with more APIs from popular CUDA libraries. Notices & Disclaimers: Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. Intel, the Intel logo and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. SYCL is a registered trademark of the Khronos Group, Inc.","PeriodicalId":360830,"journal":{"name":"Proceedings of the 2023 International Workshop on OpenCL","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3585341.3585349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

SYCL[1] is a royalty-free, cross-platform abstraction C++ programming model for heterogeneous computing. SYCL provides necessary programming interfaces like device, queue, kernel, memory interface including buffer, accessor as well as features like USM. As a programing model for heterogeneous computing, Intel oneAPI[2] provides a SYCL compiler and runtime to support SYCL kernel-based programing and set of optimized libraries to support API-based programming. SYCLomatic[3] is a project to assist developers in migrating their existing code written in different programming languages to the SYCL C++ heterogeneous programming model. SYCLomatic supports source-to-source migration from existing CUDA application source code to SYCL source code by leveraging SYCL interfaces and the optimized libraries provided by Intel oneAPI. One of the major challenges of SYCLomatic is that, in some cases, due to differences in API, expressing the identical semantic of a single line of CUDA code in SYCL requires additional data structures or multiple lines of operations. To assist the migration and make the migrated code performant and maintainable, SYCLomatic implements a compatibility library, which consists of additions to SYCL interfaces and a set of compatible APIs for popular libraries. Without the dependency to SYCLomatic, the compatibility library can be used as a standalone library for SYCL programming. In this talk, we are going to share the reason of creating the compatibility library and the design of the compatibility library. Addressing Semantic Differences: The first part of the compatibility library is to address the semantic differences with CUDA code by adding new functionality to SYCL interfaces like device, queue, malloc, image accessor, etc. by introducing new classes. (1) Utility features to access queues in different devices and threads: Keeping and passing around the sycl::device pointer between host functions is tedious. In the compatibility library, a singleton device manager class is introduced and used to track the usage of each device in different CPU threads. With the device manager class, it is easy to achieve following features: (a) Get the “current” device in a thread: The class keeps a map between threads and the last used device in the thread. The map makes it easier to access the wanted device in a host function. (b) Get the default queue for a device: When offloading a task to a device, SYCL requires developer to create a new queue on the device if the pointer of previous created queue is not available. The class keeps a default queue for each device which will be available globally. When a developer needs to use the queue on a device, the class provides a convenient interface to get the default queue of the device. (c) Device level operation (create queue, synchronize, reset): The class records all the creation of queues and maps the queues to the devices. Therefore, device level synchronization can be achieved easily. (2) Pointer-like memory operation for non-USM mode: Since managing memory through pointer operations may be more convenient for some cases, emulating pointer operations with sycl::buffer provides pointer-like memory operations including malloc, free, arithmetic, etc. for the devices which do not support USM. (3) Flexible interface to fetch Image data: The compatibility library introduces a class which simplifies the operation of fetching image data, e.g., extracting 1 or 2 channels from the image accessor. Compatible APIs: The second part of the compatibility library is to provide syntactic sugar for frequently used API calls. (1) Free functions for atomic operation: With sycl::atomic_ref, performing an atomic operation requires following 2 steps: (a) Construction of sycl::atomic_ref (b) Executing the atomic operation on the sycl::atomic_ref The compatibility library introduces a set of templated atomic calls to help developers simplify their code. (2) Utility Classes to simplify device memory allocation: Since sycl::malloc cannot be used to allocate a multi-dimension array and requires multiple steps to create a device-accessible static or global variable, a device memory class performs memory allocation and keeps the dimension information, also providing the following features: (a) Simple interface to allocate a multi-dimension array and pass it to device (b) Simple interface to create a static or global variable which can be accessed in device (3) 2D and 3D Memory Operations (USM, non-USM): The compatibility library provides free functions for 2D and 3D memory operations like allocation, memory copy, memory set, etc. to save efforts for developers. (4) Compatible APIs for popular CUDA libraries: Libraries like BLAS (Basic Linear Algebra Subprograms), CCL (Collective Communication Library), DNN (Deep Neural Network Library), STL algorithm, FFT (Fast Fourier transform), etc. are widely used in heterogeneous applications. While Intel oneAPI package provides the libraries with SYCL interfaces, there is some difference in the API design for libraries from different implementation which provide similar core functionality. The compatibility library contains APIs to bridge the usage difference and let developers implement SYCL applications with the interface they are more familiar with. Since SYCL is a relatively young language specification, many existing heterogeneous computing applications, libraries, and frameworks may not have a SYCL interface supported. With the compatibility library addressing some of the syntax/semantic differences between SYCL and other heterogeneous computing languages, developers should be able to create SYCL-based libraries/framework with less effort. To improve the functionality and useability of the compatibility library, there is still work to do, like making the compatibility library to co-exist with SYCL-implemented components in the aspect of device selection, queue activation, task synchronization, etc. and addressing interface differences with more APIs from popular CUDA libraries. Notices & Disclaimers: Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. Intel, the Intel logo and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. SYCL is a registered trademark of the Khronos Group, Inc.
SYCL兼容性库:使迁移到SYCL更容易
SYCL[1]是一种免费的跨平台抽象c++编程模型,用于异构计算。SYCL提供了必要的编程接口,如设备、队列、内核、内存接口(包括缓冲区)、访问器以及USM等特性。作为异构计算的编程模型,Intel oneAPI[2]提供了一个SYCL编译器和运行时来支持基于SYCL内核的编程,并提供了一组优化库来支持基于api的编程。SYCL[3]是一个帮助开发人员将用不同编程语言编写的现有代码迁移到SYCL c++异构编程模型的项目。syclatic通过利用SYCL接口和Intel oneAPI提供的优化库,支持从现有CUDA应用程序源代码到SYCL源代码的源到源迁移。syclatic的主要挑战之一是,在某些情况下,由于API的差异,在SYCL中表达单行CUDA代码的相同语义需要额外的数据结构或多行操作。为了帮助迁移并使迁移的代码具有高性能和可维护性,syclatic实现了一个兼容性库,该库包括对SYCL接口的添加和一组适用于流行库的兼容api。在不依赖SYCL的情况下,兼容性库可以用作SYCL编程的独立库。在这次演讲中,我们将分享创建兼容性库的原因以及兼容性库的设计。解决语义差异:兼容性库的第一部分是通过引入新类向SYCL接口(如device, queue, malloc, image accessor等)添加新功能来解决与CUDA代码之间的语义差异。(1)访问不同设备和线程中的队列的实用程序特性:在宿主函数之间保持和传递sycl::device指针是乏味的。在兼容性库中,引入了一个单例设备管理器类,并使用它来跟踪每个设备在不同CPU线程中的使用情况。使用设备管理器类,很容易实现以下功能:(a)在线程中获取“当前”设备:类在线程和线程中最后使用的设备之间保持映射。该映射使在主机函数中访问所需设备变得更加容易。(b)获取设备的默认队列:当将任务卸载到设备时,如果先前创建的队列的指针不可用,SYCL要求开发人员在设备上创建一个新的队列。该类为每个设备保留一个默认队列,该队列将全局可用。当开发人员需要在设备上使用队列时,该类提供了一个方便的接口来获取设备的默认队列。(c)设备级操作(创建队列、同步、重置):类记录所有队列的创建,并将队列映射到设备。因此,可以很容易地实现设备级同步。(2)非USM模式下的类指针内存操作:由于在某些情况下通过指针操作管理内存可能更方便,因此使用sycl::buffer模拟指针操作为不支持USM的设备提供了包括malloc、free、arithmetic等类指针内存操作。(3)灵活的获取图像数据接口:兼容性库引入了一个类,简化了获取图像数据的操作,例如从图像访问器中提取1或2个通道。兼容API:兼容性库的第二部分是为经常使用的API调用提供语法糖。(1)原子操作的自由函数:使用sycl::atomic_ref,执行原子操作需要以下两个步骤:(a)构造sycl::atomic_ref (b)在sycl::atomic_ref上执行原子操作。兼容性库引入了一组模板化的原子调用,以帮助开发人员简化代码。(2)简化设备内存分配的实用程序类:由于sycl::malloc不能用于分配多维数组,并且需要多个步骤来创建设备可访问的静态或全局变量,因此设备内存类执行内存分配并保存维度信息,还提供以下功能:(a)分配多维数组并将其传递给设备的简单接口(b)创建可在设备中访问的静态或全局变量的简单接口(3)2D和3D内存操作(USM,非USM):兼容性库提供了2D和3D内存操作的免费函数,如分配,内存复制,内存集等,为开发人员节省了精力。(4)兼容流行CUDA库的api: BLAS (Basic Linear Algebra Subprograms)、CCL (Collective Communication Library)、DNN (Deep Neural Network Library)、STL算法、FFT (Fast Fourier transform)等库被广泛应用于异构应用。 虽然英特尔oneAPI包为库提供了SYCL接口,但对于提供类似核心功能的不同实现的库,在API设计上存在一些差异。兼容性库包含的api可以弥合使用差异,并允许开发人员使用他们更熟悉的接口实现SYCL应用程序。由于SYCL是一种相对年轻的语言规范,许多现有的异构计算应用程序、库和框架可能不支持SYCL接口。由于兼容性库解决了SYCL和其他异构计算语言之间的一些语法/语义差异,开发人员应该能够更轻松地创建基于SYCL的库/框架。为了提高兼容性库的功能和可用性,还有很多工作要做,比如让兼容性库在设备选择、队列激活、任务同步等方面与sycl实现的组件共存,并与更多流行CUDA库的api解决接口差异。声明和免责声明:英特尔技术可能需要启用硬件、软件或服务激活。没有任何产品或组件是绝对安全的。您的成本和结果可能会有所不同。英特尔、英特尔标识和其他英特尔商标均为英特尔公司或其子公司的商标。其他名称和品牌可以主张为他人的财产。SYCL是Khronos Group, Inc.的注册商标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信