Toward a CAD Tool for SYCL programming

E. Fabiani, Loïc Lagadec, Alexandre Skrzyniarz, Chiara Relevat, Erell Cottour, Paul Allaire
{"title":"Toward a CAD Tool for SYCL programming","authors":"E. Fabiani, Loïc Lagadec, Alexandre Skrzyniarz, Chiara Relevat, Erell Cottour, Paul Allaire","doi":"10.1145/3585341.3585358","DOIUrl":null,"url":null,"abstract":"This poster discusses the design and operation of a CAD tool for the SYCL standard. The availability of heterogeneous platforms combining multicore, FPGA, GPGPU, manycores, CGRA on the same chip is of growing importance. A full exploitation of the available heterogeneous computing power goes through mastering the complexity of the different underlying computing models and their interactions. The SYCL standard addresses this challenge, by standardizing the definition of computing cores and memory transfers, simplifying the portability of cores (between devices on the same platform) or their reuse on different platforms. This indeed simplifies the programming of heterogeneous platforms, yet, the design space exploration, especially driving the choice of granularity of cores and their execution device, remains a major concern. Moreover, expressing dependencies between kernels through memory transfers may induce error-prone execution. Conversely, automatic code generation based on directly expressed relationships between kernels tackles this issue. The objective of our framework is to design methods and tools to solve SYCL program design problems, by following a model-driven engineering methodology. Looking for high performances during the design phase always induces an additional complexity. Our claim is that this complexity can be gradually absorbed by mastering a good development methodology relying on an automatic step-by-step refinement during implementation. Two main problems are highlighted here. The first one concerns the correspondence between the abstract model of kernel dependencies and their expression in SYCL. Defining dependencies via memory transfers limits the analysis of existing code, hence evolution and continuous verification of its structural compliance with the initial specifications. Conversely, automatic generation of interaction code from a high-level model is a way to increase productivity and reliability. The second concerns the exploration of the space of possible solutions for an application, due to the possibility of implementing a kernel on several hardware supports. This raises the question of choosing the best solutions for a given study case by generating and scoring different variants. Manually programming these variants is time-consuming and error-prone, and naturally leads to design space pruning, leaving apart some non-trivial good candidates. Instead, having an environment to generate them automatically is part of the solution. The core of our framework is a high-level object model of all the kernels of an application and their dependencies. This model is made up of classes that reflect the structure of a SYCL program, seen as a graph whose nodes are the command groups that depend on the data they manipulate. The Data class, which characterizes an accessor, is associated with a variable (type, number of dimensions, size of each dimension) and provides its name and access mode (read, write, read_write). In the case of a dynamic construction, the access mode is automatically inferred from the expressed dependencies. The Command_group class characterizes a command group, designating a kernel and a device on which to run. An instance also contains a list of Data objects which correspond to the accessors used by the kernel. The model does not express the internal code of the kernels, which are seen as black boxes. Their SYCL code (in the form of a lambda function) is stored in independent files, which are referenced by a unique name. The Data associated with a command group must be of the same nature as the parameters used in the code of the kernel. The Node class encapsulates a Command_group and makes it possible to describe the dependencies between command groups as a directed graph (class Graph). The Graph class automatically labels the nodes in a possible order of execution with respect to the expressed dependencies. This is mandatory for the code generation phase which requires the command groups to be declared in sequence. There are two ways to express a SYCL program in our framework, so that to generate the associated object. (1) By using a SYCL code parser (on a restricted subset of the language), which promotes reusing existing code and integrate it into the framework. The parser isolates the kernel codes as independent files, analyzes the dependencies between kernels and expresses them in the generated graph. (2) By modifying (or writing) a C++ program, and using instance’s creation methods, for the expression of dependency between command groups. The framework comes with a viewer of the dependency graph of command group executions (based on Graphviz): either on the complete graph or device by device. This helps checking the expression conformity of the program against the specifications or analyzing the structure of an existing SYCL program once processed by the parser. One key objective of the project is to allow the automatic transformation of code by integrating it into the abstract model. Currently, this is done in the following two cases: The modification of the structure and characteristics of an existing program (dependencies, kernels used, devices, accessors) with no need for modifying the code itself. This is implemented via an HMI. One typical benefit would be on-demand addition of monitoring functionalities, for example to measure performances or integrate assertions. In the current version, the user can request a code generation with execution time per each command group as a metric. This happens by the automatic creation of events. Then the generation of SYCL code is done by a method which extracts all the information contained in a Graph object and integrates the code of the kernels. The order of declaration of command groups derives from the automatically calculated labeling. Prospects for further development of the framework include:","PeriodicalId":360830,"journal":{"name":"Proceedings of the 2023 International Workshop on OpenCL","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3585341.3585358","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This poster discusses the design and operation of a CAD tool for the SYCL standard. The availability of heterogeneous platforms combining multicore, FPGA, GPGPU, manycores, CGRA on the same chip is of growing importance. A full exploitation of the available heterogeneous computing power goes through mastering the complexity of the different underlying computing models and their interactions. The SYCL standard addresses this challenge, by standardizing the definition of computing cores and memory transfers, simplifying the portability of cores (between devices on the same platform) or their reuse on different platforms. This indeed simplifies the programming of heterogeneous platforms, yet, the design space exploration, especially driving the choice of granularity of cores and their execution device, remains a major concern. Moreover, expressing dependencies between kernels through memory transfers may induce error-prone execution. Conversely, automatic code generation based on directly expressed relationships between kernels tackles this issue. The objective of our framework is to design methods and tools to solve SYCL program design problems, by following a model-driven engineering methodology. Looking for high performances during the design phase always induces an additional complexity. Our claim is that this complexity can be gradually absorbed by mastering a good development methodology relying on an automatic step-by-step refinement during implementation. Two main problems are highlighted here. The first one concerns the correspondence between the abstract model of kernel dependencies and their expression in SYCL. Defining dependencies via memory transfers limits the analysis of existing code, hence evolution and continuous verification of its structural compliance with the initial specifications. Conversely, automatic generation of interaction code from a high-level model is a way to increase productivity and reliability. The second concerns the exploration of the space of possible solutions for an application, due to the possibility of implementing a kernel on several hardware supports. This raises the question of choosing the best solutions for a given study case by generating and scoring different variants. Manually programming these variants is time-consuming and error-prone, and naturally leads to design space pruning, leaving apart some non-trivial good candidates. Instead, having an environment to generate them automatically is part of the solution. The core of our framework is a high-level object model of all the kernels of an application and their dependencies. This model is made up of classes that reflect the structure of a SYCL program, seen as a graph whose nodes are the command groups that depend on the data they manipulate. The Data class, which characterizes an accessor, is associated with a variable (type, number of dimensions, size of each dimension) and provides its name and access mode (read, write, read_write). In the case of a dynamic construction, the access mode is automatically inferred from the expressed dependencies. The Command_group class characterizes a command group, designating a kernel and a device on which to run. An instance also contains a list of Data objects which correspond to the accessors used by the kernel. The model does not express the internal code of the kernels, which are seen as black boxes. Their SYCL code (in the form of a lambda function) is stored in independent files, which are referenced by a unique name. The Data associated with a command group must be of the same nature as the parameters used in the code of the kernel. The Node class encapsulates a Command_group and makes it possible to describe the dependencies between command groups as a directed graph (class Graph). The Graph class automatically labels the nodes in a possible order of execution with respect to the expressed dependencies. This is mandatory for the code generation phase which requires the command groups to be declared in sequence. There are two ways to express a SYCL program in our framework, so that to generate the associated object. (1) By using a SYCL code parser (on a restricted subset of the language), which promotes reusing existing code and integrate it into the framework. The parser isolates the kernel codes as independent files, analyzes the dependencies between kernels and expresses them in the generated graph. (2) By modifying (or writing) a C++ program, and using instance’s creation methods, for the expression of dependency between command groups. The framework comes with a viewer of the dependency graph of command group executions (based on Graphviz): either on the complete graph or device by device. This helps checking the expression conformity of the program against the specifications or analyzing the structure of an existing SYCL program once processed by the parser. One key objective of the project is to allow the automatic transformation of code by integrating it into the abstract model. Currently, this is done in the following two cases: The modification of the structure and characteristics of an existing program (dependencies, kernels used, devices, accessors) with no need for modifying the code itself. This is implemented via an HMI. One typical benefit would be on-demand addition of monitoring functionalities, for example to measure performances or integrate assertions. In the current version, the user can request a code generation with execution time per each command group as a metric. This happens by the automatic creation of events. Then the generation of SYCL code is done by a method which extracts all the information contained in a Graph object and integrates the code of the kernels. The order of declaration of command groups derives from the automatically calculated labeling. Prospects for further development of the framework include:
一个用于SYCL编程的CAD工具
这张海报讨论了SYCL标准的CAD工具的设计和操作。多核、FPGA、GPGPU、多核、CGRA异构平台在同一芯片上的可用性变得越来越重要。充分利用可用的异构计算能力需要掌握不同底层计算模型及其相互作用的复杂性。SYCL标准通过标准化计算核心和内存传输的定义、简化核心的可移植性(在同一平台上的设备之间)或它们在不同平台上的重用来解决这一挑战。这确实简化了异构平台的编程,但是,设计空间的探索,特别是驱动内核粒度及其执行设备的选择,仍然是一个主要问题。此外,通过内存传输来表达内核之间的依赖关系可能会导致容易出错的执行。相反,基于内核之间直接表示的关系的自动代码生成可以解决这个问题。我们框架的目标是设计方法和工具,通过遵循模型驱动的工程方法论来解决SYCL程序设计问题。在设计阶段寻求高性能总是会带来额外的复杂性。我们的主张是,这种复杂性可以通过掌握一个良好的开发方法来逐渐吸收,该方法依赖于在实现过程中自动逐步细化。这里强调了两个主要问题。第一个问题涉及内核依赖关系的抽象模型与其在SYCL中的表达之间的对应关系。通过内存传输来定义依赖关系限制了对现有代码的分析,因此限制了对其结构遵从初始规范的演变和持续验证。相反,从高级模型自动生成交互代码是提高生产力和可靠性的一种方法。第二个问题涉及对应用程序可能解决方案空间的探索,因为在多个硬件支持上实现内核是可能的。这就提出了一个问题,即通过生成和评分不同的变体,为给定的研究案例选择最佳解决方案。手动编程这些变体既耗时又容易出错,并且自然会导致设计空间缩减,留下一些重要的优秀候选。相反,拥有一个自动生成它们的环境是解决方案的一部分。我们框架的核心是应用程序的所有内核及其依赖关系的高级对象模型。该模型由反映SYCL程序结构的类组成,可以看作是一个图,其节点是依赖于它们所操作的数据的命令组。Data类描述了一个访问器,它与一个变量(类型、维数、每个维的大小)相关联,并提供了变量的名称和访问模式(read、write、read_write)。在动态构造的情况下,从表示的依赖项自动推断访问模式。Command_group类描述命令组的特征,指定要在其上运行的内核和设备。实例还包含与内核使用的访问器相对应的数据对象列表。该模型不表示内核的内部代码,内核被视为黑盒。它们的SYCL代码(以lambda函数的形式)存储在独立的文件中,这些文件由唯一的名称引用。与命令组关联的Data必须与内核代码中使用的参数具有相同的性质。Node类封装了一个Command_group,并使得将命令组之间的依赖关系描述为有向图(类graph)成为可能。Graph类根据所表达的依赖关系,以可能的执行顺序自动标记节点。这对于代码生成阶段是必需的,因为该阶段要求按顺序声明命令组。在我们的框架中有两种表达SYCL程序的方法,以便生成相关的对象。(1)通过使用SYCL代码解析器(在语言的受限子集上),促进重用现有代码并将其集成到框架中。解析器将内核代码分离为独立文件,分析内核之间的依赖关系,并在生成的图中表示它们。(2)通过修改(或编写)c++程序,并使用实例的创建方法,来表达命令组之间的依赖关系。该框架附带了一个查看命令组执行的依赖图(基于Graphviz)的查看器:可以是完整的图,也可以是逐个设备的图。这有助于根据规范检查程序的表达式一致性,或者在解析器处理后分析现有SYCL程序的结构。 该项目的一个关键目标是通过将代码集成到抽象模型中来实现代码的自动转换。目前,这是在以下两种情况下完成的:修改现有程序的结构和特征(依赖项、使用的内核、设备、访问器),而无需修改代码本身。这是通过HMI实现的。一个典型的好处是按需添加监视功能,例如测量性能或集成断言。在当前版本中,用户可以请求代码生成,并将每个命令组的执行时间作为度量。这是通过自动创建事件实现的。然后,采用提取图形对象中包含的所有信息并集成内核代码的方法生成SYCL代码。命令组的声明顺序来源于自动计算的标签。该框架进一步发展的前景包括:
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信