Proceedings of the 5th International Workshop on OpenCL最新文献

筛选
英文 中文
OpenCL Interoperability with OpenVX Graphs OpenCL与OpenVX图形的互操作性
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078183
Ben Ashbaugh, A. Bernal
{"title":"OpenCL Interoperability with OpenVX Graphs","authors":"Ben Ashbaugh, A. Bernal","doi":"10.1145/3078155.3078183","DOIUrl":"https://doi.org/10.1145/3078155.3078183","url":null,"abstract":"OpenVX is a computer vision framework that enables embedded and real-time applications to optimize computer vision processing for performance and power. OpenVX addresses system-level optimizations by making use of a graph-based computational API. Although this gives a clear advantage over other traditional computer vision libraries such as OpenCV, which mainly addresses kernel-level optimizations, OpenVX still relies on vendor implementations to optimize individual built-in kernels. OpenVX implements several computer vision kernels but in order to increase adoption and user flexibility, OpenVX added support for C based user-kernels, which by default are single-threaded and there is no particular way to accelerate kernels or offload the computation to an accelerator such us a GPU. The user has to do the heavy lifting of supporting a multi-threaded implementation. We propose two different OpenVX API extensions to allow developers deploy accelerated user-kernels using OpenCL.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115105581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CUDA-on-CL: a compiler and runtime for running NVIDIA® CUDA™ C++11 applications on OpenCL™ 1.2 Devices CUDA-on- cl:用于在OpenCL™1.2设备上运行NVIDIA®CUDA™c++ 11应用程序的编译器和运行时
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078156
Hugh Perkins
{"title":"CUDA-on-CL: a compiler and runtime for running NVIDIA® CUDA™ C++11 applications on OpenCL™ 1.2 Devices","authors":"Hugh Perkins","doi":"10.1145/3078155.3078156","DOIUrl":"https://doi.org/10.1145/3078155.3078156","url":null,"abstract":"In the machine learning domain, machine learning frameworks are predominantly written and maintained in NVIDIA® CUDA™ language. There have been attempts to port these frameworks to OpenCL®, notably the ports of Caffe framework by Gu et al; Tschopp; and Engel; and of Torch framework by Perkins. The authors of these frameworks found merging their work into the mainstream framework challenging, and maintain their forks as separate branches or repositories. CUDA-on-CL addresses this problem by leaving the reference implementation entirely in NVIDIA CUDA, both host-side and device-side, and providing a compiler and a runtime component, so that any CUDA C++11 application can in theory be compiled and run on any OpenCL 1.2 device. We use Tensorflow framework as a case-study, and demonstrate the ability to run unary, binary and reduction Tensorflow and Eigen kernels, with no modification to the original CUDA source-code. Performance studies are undertaken, using the Tensorflow kernels. For buffer sizes of 1MB or more, performance is comparable between CUDA and CUDA-on-CL, across unary operations, binary operations and single-axis reductions. Full reduction is around 14 times slower on CUDA-on-CL than on CUDA. We think this may be because of the absence of the low-level hardware shfl operation. The asymptotic time for zero buffer sizes is double that of CUDA, possibly because of the overhead of additional kernel boilerplate needed to workaround limitations in the OpenCL 1.2 standard.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130494139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The Windsor Build and Testing Framework 温莎构建和测试框架
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078184
Shane M. Peelar, P. Preney
{"title":"The Windsor Build and Testing Framework","authors":"Shane M. Peelar, P. Preney","doi":"10.1145/3078155.3078184","DOIUrl":"https://doi.org/10.1145/3078155.3078184","url":null,"abstract":"Khronos open source components, including the ICD and Clang compiler, require significant time and effort to manually download, build, and install. Source code updates to these components require recompilation, and developers must repeat error-prone steps to build new test environments. Ideally developers should be able to use a tool that automatically obtains, builds, and installs OpenCL codes, libraries, and tools. The Windsor Build and Testing Framework (WBTF) is a tool that has been developed at the University of Windsor that does this. This paper will discuss how the WBTF works, demonstrate how it is used, will show how OpenCL C and C++ programs can be built, run, and/or used to perform various header-only, link, and/or various conformance-style tests using OpenCL reference, host-installed, or using device-installed header and libraries. Those interested in OpenCL C/C++ development, the Khronos OpenCL Clang compiler, and in writing conformance tests will be interested in this framework.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133213902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the feasibility of OpenCL CPU implementations for agent-based simulations 评估基于代理模拟的OpenCL CPU实现的可行性
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078174
Nuno Fachada, A. Rosa
{"title":"Assessing the feasibility of OpenCL CPU implementations for agent-based simulations","authors":"Nuno Fachada, A. Rosa","doi":"10.1145/3078155.3078174","DOIUrl":"https://doi.org/10.1145/3078155.3078174","url":null,"abstract":"Agent-based modeling (ABM) is a bottom-up modeling approach, where each entity of the system being modeled is uniquely represented as a self-determining agent. Large scale emergent behavior in ABMs is population sensitive. As such, it is advisable that the number of agents in a simulation is able to reflect the reality of the system being modeled. This means that in domains such as social modeling, ecology, and biology, systems can contain millions or billions of individuals. Such large scale simulations are only feasible in non-distributed scenarios when the computational power of commodity processors, such as GPUs and multi-core CPUs, is fully exploited. In this paper we evaluate the feasibility of using CPU-oriented OpenCL for high-performance simulations of agent-based models. We compare a CPU-oriented OpenCL implementation of a reference ABM against a parallel Java version of the same model. We show that there are considerable gains in using CPU-based OpenCL for developing and implementing ABMs, with speedups up to 10x over the parallel Java version on a 10-core hyper-threaded CPU.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124142591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Challenges and Opportunities in Native GPU Debugging 原生GPU调试的挑战与机遇
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078158
Jeff McAllister, Uri Levy
{"title":"Challenges and Opportunities in Native GPU Debugging","authors":"Jeff McAllister, Uri Levy","doi":"10.1145/3078155.3078158","DOIUrl":"https://doi.org/10.1145/3078155.3078158","url":null,"abstract":"In this technical session we present the open architectural design of the debugger and how it fits into the OpenCL JIT compilation flow. We demonstrate a show case on how to natively work with the debugger to solve functional bugs, as-well-as low-level debugging techniques on SIMD thread level which help to solve complex issues such as misaligned or out of range accesses to local/global memory, stack overflows, Illegal instructions, etc. Finally, we cover the challenges in debugging.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115260310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Creating High Performance Applications with Intel's FPGA OpenCL™ SDK 使用英特尔的FPGA OpenCL™SDK创建高性能应用程序
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078169
A. Ling, U. Aydonat, Shane O'Connell, D. Capalija, Gordon R. Chiu
{"title":"Creating High Performance Applications with Intel's FPGA OpenCL™ SDK","authors":"A. Ling, U. Aydonat, Shane O'Connell, D. Capalija, Gordon R. Chiu","doi":"10.1145/3078155.3078169","DOIUrl":"https://doi.org/10.1145/3078155.3078169","url":null,"abstract":"After decades of research, High-Level Synthesis has finally caught on as a mainstream design technique for FPGAs. However, achieving performance results that are comparable to designing at a hardware description level still remains a challenge. In this talk, we illustrate how we achieve world class performance results on HPC applications by using OpenCL. Specifically, we show how we achieve 1Tflop of performance on a matrix multiply and over 1.3Tflops on a CNN application, run on Intel's 20nm Arria 10 FPGA device. By leveraging specific coding styles, we show how you can achieve peak performance on the FPGA without having to resort to tedious hardware design languages. Finally, we will describe spatial coding techniques that lead to efficient structures, such as systolic-arrays, to ensure that the FPGA runs efficiently.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121932909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Production-CL library for iterative scientific calculations 用于迭代科学计算的Production-CL库
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078162
P. Kartsev
{"title":"Production-CL library for iterative scientific calculations","authors":"P. Kartsev","doi":"10.1145/3078155.3078162","DOIUrl":"https://doi.org/10.1145/3078155.3078162","url":null,"abstract":"The Production-CL library for iterative scientific calculations with OpenCL is presented. The main goal is to get rid of long repeating lines of standard code which slow down the development process, and realize the typical workflow elements for simulation of physics problems. Main entities of PCL library are: (i) kernel (called with single line resembling CUDA kernel invocation) and (ii) batch of kernels (to help constructing complex step of each iteration). In addition, PCL realizes the procedures standard for scientific calculations 'in production': typical cycle of iterations with main step and regular save/load the whole state, to save work. As an example of library application, we show and compare several projects developed with different approaches.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129256312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Compiler Techniques for Efficient MATLAB to OpenCL Code Generation 高效MATLAB到OpenCL代码生成的编译器技术
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078186
Luís Reis, João Bispo, João MP Cardoso
{"title":"Compiler Techniques for Efficient MATLAB to OpenCL Code Generation","authors":"Luís Reis, João Bispo, João MP Cardoso","doi":"10.1145/3078155.3078186","DOIUrl":"https://doi.org/10.1145/3078155.3078186","url":null,"abstract":"MATLAB is a high-level language used in various scientific and engineering fields. Deployment of well-tested MATLAB code to production would be highly desirable, but in practice a number of obstacles prevent this, notably performance and portability. Although MATLAB-to-C compilers exist, the performance of the generated C code may not be sufficient and thus it is important to research alternatives, such as CPU parallelism, GPGPU computing and FPGAs. OpenCL is an API and programming language that allows targeting these devices, hence the motivation for MATLAB-to-OpenCL compilation. In this paper, we describe our recent efforts on offloading code to OpenCL devices in the context of our MATLAB to C/OpenCL compiler.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129303035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SYCL-BLAS: Leveraging Expression Trees for Linear Algebra SYCL-BLAS:利用线性代数的表达式树
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078189
J. Aliaga, Ruymán Reyes, M. Goli
{"title":"SYCL-BLAS: Leveraging Expression Trees for Linear Algebra","authors":"J. Aliaga, Ruymán Reyes, M. Goli","doi":"10.1145/3078155.3078189","DOIUrl":"https://doi.org/10.1145/3078155.3078189","url":null,"abstract":"In the current landscape of C++ applications, there is an increasing need of including different levels of support for heterogeneous platforms, where multiple specialised devices collaborate to execute an application. In this context, the SYCL standard[8] has been published by Khronos, providing a C++ abstraction layer on top of OpenCL[9] that enables single-source programming for a large number of heterogeneous devices. SYCL single-source programming and task data-flow approach enable developers to leverage modern programming techniques on heterogeneous platforms. In this paper, we present SYCL-BLAS, a BLAS implementation using SYCL that uses Expression Tree templates to generate BLAS kernels. This technique is then used to demonstrate seamless kernel fusion via composition of tree nodes. We also demonstrate how SYCL can be used to quickly develop libraries for heterogeneous systems by providing sufficient levels of abstraction.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129733810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
SYCL C++ and OpenCL interoperability experimentation with triSYCL syclc++与OpenCL的互操作性实验
Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078188
Anastasios Doumoulakis, R. Keryell, Kenneth O'Brien
{"title":"SYCL C++ and OpenCL interoperability experimentation with triSYCL","authors":"Anastasios Doumoulakis, R. Keryell, Kenneth O'Brien","doi":"10.1145/3078155.3078188","DOIUrl":"https://doi.org/10.1145/3078155.3078188","url":null,"abstract":"Heterogeneous computing is required in systems ranging from low-end embedded systems up to the high-end HPC systems to reach high-performance while keeping power consumption low. Having more and more accelerators and CPUs also creates challenges for the programmer, requiring even more expertise of them. Fortunately, new modern C++-based domain-specific languages, such as the SYCL open standard from Khronos Group, simplify the programming at the full system level while keeping high performance. SYCL is a single-source programming model providing a task graph of heterogeneous kernels that can be run on various accelerators or even just the CPU. The memory heterogeneity is abstracted through buffer objects and the memory usage is abstracted with accessor objects. From these accessors, the task graph is implicitly constructed, the synchronizations and the data movements across the various physical memories are done automatically, by opposition to OpenCL or CUDA. Sometimes, some applications or libraries already exist using the OpenCL standard or some OpenCL kernels are provided, either as OpenCL kernel source code or even as built-in OpenCL kernels written in RTL for extreme optimization on FPGA. SYCL provides an OpenCL interoperability mode to reuse existing OpenCL code while keeping the higher level task graph programming model without needing explicit memory transfers. We present some experiments on two applications on GPU and FPGA with the triSYCL open-source implementation to show the benefits of this OpenCL interoperability mode.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125850670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信