Proceedings of the 5th International Workshop on OpenCL最新文献

OpenCL Interoperability with OpenVX Graphs OpenCL与OpenVX图形的互操作性

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078183

Ben Ashbaugh, A. Bernal

引用次数: 1

CUDA-on-CL: a compiler and runtime for running NVIDIA® CUDA™ C++11 applications on OpenCL™ 1.2 Devices CUDA-on- cl:用于在OpenCL™1.2设备上运行NVIDIA®CUDA™c++ 11应用程序的编译器和运行时

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078156

Hugh Perkins

{"title":"CUDA-on-CL: a compiler and runtime for running NVIDIA® CUDA™ C++11 applications on OpenCL™ 1.2 Devices","authors":"Hugh Perkins","doi":"10.1145/3078155.3078156","DOIUrl":"https://doi.org/10.1145/3078155.3078156","url":null,"abstract":"In the machine learning domain, machine learning frameworks are predominantly written and maintained in NVIDIA® CUDA™ language. There have been attempts to port these frameworks to OpenCL®, notably the ports of Caffe framework by Gu et al; Tschopp; and Engel; and of Torch framework by Perkins. The authors of these frameworks found merging their work into the mainstream framework challenging, and maintain their forks as separate branches or repositories. CUDA-on-CL addresses this problem by leaving the reference implementation entirely in NVIDIA CUDA, both host-side and device-side, and providing a compiler and a runtime component, so that any CUDA C++11 application can in theory be compiled and run on any OpenCL 1.2 device. We use Tensorflow framework as a case-study, and demonstrate the ability to run unary, binary and reduction Tensorflow and Eigen kernels, with no modification to the original CUDA source-code. Performance studies are undertaken, using the Tensorflow kernels. For buffer sizes of 1MB or more, performance is comparable between CUDA and CUDA-on-CL, across unary operations, binary operations and single-axis reductions. Full reduction is around 14 times slower on CUDA-on-CL than on CUDA. We think this may be because of the absence of the low-level hardware shfl operation. The asymptotic time for zero buffer sizes is double that of CUDA, possibly because of the overhead of additional kernel boilerplate needed to workaround limitations in the OpenCL 1.2 standard.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130494139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

The Windsor Build and Testing Framework 温莎构建和测试框架

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078184

Shane M. Peelar, P. Preney

引用次数: 0

Assessing the feasibility of OpenCL CPU implementations for agent-based simulations 评估基于代理模拟的OpenCL CPU实现的可行性

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078174

Nuno Fachada, A. Rosa

引用次数: 3

Challenges and Opportunities in Native GPU Debugging 原生GPU调试的挑战与机遇

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078158

Jeff McAllister, Uri Levy

引用次数: 0

Creating High Performance Applications with Intel's FPGA OpenCL™ SDK 使用英特尔的FPGA OpenCL™SDK创建高性能应用程序

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078169

A. Ling, U. Aydonat, Shane O'Connell, D. Capalija, Gordon R. Chiu

引用次数: 6

Production-CL library for iterative scientific calculations 用于迭代科学计算的Production-CL库

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078162

P. Kartsev

引用次数: 4

Compiler Techniques for Efficient MATLAB to OpenCL Code Generation 高效MATLAB到OpenCL代码生成的编译器技术

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078186

Luís Reis, João Bispo, João MP Cardoso

引用次数: 1

SYCL-BLAS: Leveraging Expression Trees for Linear Algebra SYCL-BLAS:利用线性代数的表达式树

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078189

J. Aliaga, Ruymán Reyes, M. Goli

引用次数: 10

SYCL C++ and OpenCL interoperability experimentation with triSYCL syclc++与OpenCL的互操作性实验

Proceedings of the 5th International Workshop on OpenCL Pub Date : 2017-05-16 DOI: 10.1145/3078155.3078188

Anastasios Doumoulakis, R. Keryell, Kenneth O'Brien

{"title":"SYCL C++ and OpenCL interoperability experimentation with triSYCL","authors":"Anastasios Doumoulakis, R. Keryell, Kenneth O'Brien","doi":"10.1145/3078155.3078188","DOIUrl":"https://doi.org/10.1145/3078155.3078188","url":null,"abstract":"Heterogeneous computing is required in systems ranging from low-end embedded systems up to the high-end HPC systems to reach high-performance while keeping power consumption low. Having more and more accelerators and CPUs also creates challenges for the programmer, requiring even more expertise of them. Fortunately, new modern C++-based domain-specific languages, such as the SYCL open standard from Khronos Group, simplify the programming at the full system level while keeping high performance. SYCL is a single-source programming model providing a task graph of heterogeneous kernels that can be run on various accelerators or even just the CPU. The memory heterogeneity is abstracted through buffer objects and the memory usage is abstracted with accessor objects. From these accessors, the task graph is implicitly constructed, the synchronizations and the data movements across the various physical memories are done automatically, by opposition to OpenCL or CUDA. Sometimes, some applications or libraries already exist using the OpenCL standard or some OpenCL kernels are provided, either as OpenCL kernel source code or even as built-in OpenCL kernels written in RTL for extreme optimization on FPGA. SYCL provides an OpenCL interoperability mode to reuse existing OpenCL code while keeping the higher level task graph programming model without needing explicit memory transfers. We present some experiments on two applications on GPU and FPGA with the triSYCL open-source implementation to show the benefits of this OpenCL interoperability mode.","PeriodicalId":267581,"journal":{"name":"Proceedings of the 5th International Workshop on OpenCL","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125850670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9