International Workshop on OpenCL最新文献

筛选
英文 中文
C++ for OpenCL 2021 c++ for OpenCL 2021
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529981
Justas Janickas, Anastasia Stulova
{"title":"C++ for OpenCL 2021","authors":"Justas Janickas, Anastasia Stulova","doi":"10.1145/3529538.3529981","DOIUrl":"https://doi.org/10.1145/3529538.3529981","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77282538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Compilation Performance of Current SYCL Implementations 论当前SYCL实现的编译性能
International Workshop on OpenCL Pub Date : 2022-05-10 DOI: 10.1145/3529538.3529548
Peter Thoman, Facundo Molina Heredia, T. Fahringer
{"title":"On the Compilation Performance of Current SYCL Implementations","authors":"Peter Thoman, Facundo Molina Heredia, T. Fahringer","doi":"10.1145/3529538.3529548","DOIUrl":"https://doi.org/10.1145/3529538.3529548","url":null,"abstract":"The Khronos SYCL abstraction layer is designed to enable programming heterogeneous platforms, consisting of host and accelerator devices, with a single-source code base. In order to allow for a high level of abstraction while still providing competitive runtime performance, both SYCL implementations and the software ecosystems built around SYCL applications frequently make heavy use of C++ templates. A potential consequence of this design choice, as well as the need to generate code for both a host and at least one device architecture, are significant compilation times. In this work we set out to study the relative compile-time performance and the impact of various SYCL features on compilation times across a selection of the most widely-used SYCL implementations. To this end, we introduce a code generator which creates SYCL kernels stressing various API features and instruction types, either in isolation or in combination, as well as an infrastructure to largely automate related experiments. We apply this infrastructure in a large-scale synthetic evaluation totaling 96000 compiler runs, which also includes a study of the compilation performance over time of the most widespread implementations. In addition to these synthetic experiments, we validate the applicability of our findings by measuring the compile times of two real-world industrial SYCL applications. On the basis of these experiments, we point out particularly impactful – in terms of compile-time performance – changes during the development of some SYCL implementations, and formulate suggestions for SYCL implementation developers as well as users. We have made both the code generator and all the tools we developed to carry out the experiments in this paper available as open source.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85162004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Experiences Porting NAMD to the Data Parallel C++ Programming Model. 将 NAMD 移植到数据并行 C++ 编程模型的经验。
International Workshop on OpenCL Pub Date : 2022-05-01 Epub Date: 2022-05-10 DOI: 10.1145/3529538.3529560
David J Hardy, Jaemin Choi, Wei Jiang, Emad Tajkhorshid
{"title":"Experiences Porting NAMD to the Data Parallel C++ Programming Model.","authors":"David J Hardy, Jaemin Choi, Wei Jiang, Emad Tajkhorshid","doi":"10.1145/3529538.3529560","DOIUrl":"10.1145/3529538.3529560","url":null,"abstract":"<p><p>HPC applications have a growing need to leverage heterogeneous computing resources with a vendor-neutral programming paradigm. Data Parallel C++ is a programming language based on open standards SYCL, providing a vendor-neutral solution. We describe our experiences porting the NAMD molecular dynamics application with its GPU-offload force kernels to SYCL/DPC++. Results are shown that demonstrate correctness of the porting effort.</p>","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10276636/pdf/nihms-1892994.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9708927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking a Proof-of-Concept Performance Portable SYCL-based Fast Fourier Transformation Library 对基于sycl的便携式快速傅里叶变换库进行基准测试
International Workshop on OpenCL Pub Date : 2022-03-17 DOI: 10.1145/3529538.3529996
V. Pascuzzi, M. Goli
{"title":"Benchmarking a Proof-of-Concept Performance Portable SYCL-based Fast Fourier Transformation Library","authors":"V. Pascuzzi, M. Goli","doi":"10.1145/3529538.3529996","DOIUrl":"https://doi.org/10.1145/3529538.3529996","url":null,"abstract":"In this paper, we present an early version of a SYCL-based FFT library, capable of running on all major vendor hardware, including CPUs and GPUs from AMD, ARM, Intel and NVIDIA. The current limitations of our library is it supports single-dimension FFTs up to 211 in length and base-2 input sequences. Although preliminary, the aim of this work is to seed further developments for a rich set of features for calculating FFTs. The library has the advantage over existing portable FFT libraries in that it is single-source, and therefore removes the complexities that arise due to abundant use of pre-processor macros and auto-generated kernels to target different architectures. We exercise two SYCL-enabled compilers, Codeplay ComputeCpp and Intel’s open-source LLVM project, to evaluate performance portability of our SYCL-based FFT on various heterogeneous architectures.We provide studies comparing our portable library with highly optimized vendor-specific FFT libraries, and discuss potential sources hindering performance.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78333781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
International Workshop on OpenCL OpenCL国际研讨会
International Workshop on OpenCL Pub Date : 2022-01-01 DOI: 10.1145/3529538
{"title":"International Workshop on OpenCL","authors":"","doi":"10.1145/3529538","DOIUrl":"https://doi.org/10.1145/3529538","url":null,"abstract":"","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87412415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling the Use of C++20 Unseq Execution Policy for OpenCL 在OpenCL中启用c++ 20 Unseq执行策略
International Workshop on OpenCL Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456674
Po-Yao Chang, Tai-Liang Chen, Jenq-Kuen Lee
{"title":"Enabling the Use of C++20 Unseq Execution Policy for OpenCL","authors":"Po-Yao Chang, Tai-Liang Chen, Jenq-Kuen Lee","doi":"10.1145/3456669.3456674","DOIUrl":"https://doi.org/10.1145/3456669.3456674","url":null,"abstract":"This work facilitates the usage of unsequenced execution policy as seen in C++20 standard library with the newly introduced OpenCL kernel language, C++ for OpenCL. By passing unseq, a global object of type unsequenced_policy, as an argument to selected C++ parallel algorithms, the function would then be vectorized with the help of clang and LLVM. This work complements the introduction of C++ for OpenCL, which brings the core language part of C++17 to OpenCL while leaving out the standard library part. In the best case, we see a whopping 6.9 time speedup.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77900503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Toward a Better Defined SYCL Memory Consistency Model 更好地定义SYCL内存一致性模型
International Workshop on OpenCL Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456696
Ben Ashbaugh, James C. Brodman, M. Kinsner, G. Lueck, S. Pennycook, Roland Schulz
{"title":"Toward a Better Defined SYCL Memory Consistency Model","authors":"Ben Ashbaugh, James C. Brodman, M. Kinsner, G. Lueck, S. Pennycook, Roland Schulz","doi":"10.1145/3456669.3456696","DOIUrl":"https://doi.org/10.1145/3456669.3456696","url":null,"abstract":"A memory consistency model is a key component of a parallel programming model that describes guaranteed behavior for applications and valid optimizations for implementers. The SYCL 2020 specification took a step forward by adopting the atomic_ref syntax from the C++20 specification and concepts similar to memory scopes from the OpenCL 2.0 specification, though further efforts to formalize the SYCL memory model are ongoing and will be progressed in future specifications. This technical presentation will summarize the guarantees and several unexpected non-guarantees that are provided by the memory model in the SYCL 2020 specification, using accessible language and examples. The talk will describe memory models from other parallel programming models that could inform and influence the SYCL memory model, including the C++, OpenCL 2.0, and Vulkan memory models. The talk will also describe features unique to the SYCL specification that will need to be included in the SYCL memory model, such as unified shared memory, which introduce challenges that have not been solved in existing memory models.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81359423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accelerating Regular-Expression Matching on FPGAs with High-Level Synthesis 基于高级合成的fpga正则表达式匹配加速
International Workshop on OpenCL Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456716
Devon Callanan, Luke Kljucaric, A. George
{"title":"Accelerating Regular-Expression Matching on FPGAs with High-Level Synthesis","authors":"Devon Callanan, Luke Kljucaric, A. George","doi":"10.1145/3456669.3456716","DOIUrl":"https://doi.org/10.1145/3456669.3456716","url":null,"abstract":"The importance of security infrastructures for high-throughput networks has rapidly grown as a result of expanding internet traffic and increasingly high-bandwidth connections. Intrusion-detection systems (IDSs) such as SNORT rely upon rule sets designed to alert system administrators of malicious packets. Such deep-packet inspection, which depends upon regular-expression searches, can be accelerated on programmable-logic (PL) architectures using non-deterministic finite automata (NFAs). Prior designs have relied upon register-transfer level (RTL) design descriptions and achieved efficient resource utilization through fine-grained optimizations. New advances made by field-programmable gate array (FPGA) vendors have led to more powerful compiler toolchains for OpenCL that allow for rapid development on PL architectures while generating competitive designs in terms of performance. The goal of this research is to evaluate performance differences between a custom, OpenCL-based, acceleration architecture for regular expressions and comparable RTL designs. The simplicity of the application, which requires only basic hardware building blocks, adds to the novelty of the comparison. In contrast to RTL-based solutions, which show frequency degradation with bandwidth scaling, our approach is able to maintain stable and high operating frequencies at the cost of resource usage. By scaling input bandwidth with multi-character transformations, throughput in excess of 17 Gbps can be achieved on Intel’s Arria 10 Programmable Acceleration Card, outperforming similar designs with RTL as reported in the literature.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85255478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On measuring the maturity of SYCL implementations by tracking historical performance improvements 通过跟踪历史性能改进来度量SYCL实现的成熟度
International Workshop on OpenCL Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456701
Wei-Chen Lin, Tom Deakin, Simon McIntosh-Smith
{"title":"On measuring the maturity of SYCL implementations by tracking historical performance improvements","authors":"Wei-Chen Lin, Tom Deakin, Simon McIntosh-Smith","doi":"10.1145/3456669.3456701","DOIUrl":"https://doi.org/10.1145/3456669.3456701","url":null,"abstract":"SYCL is a platform agnostic, single-source, C++ based, parallel programming framework for developing platform independent software for heterogeneous systems. As an emerging framework, SYCL has been under active development for several years, with multiple implementations available from hardware vendors and others. A crucial metric for potential adopters is how mature these implementations are; are they still improving rapidly, indicating that the space is still quite immature, or has performance improvement plateaued, potentially indicating a mature market? This study presents a historical study of the performance delivered by all major SYCL implementations on a range of supported platforms. We use existing HPC-style mini-apps written in SYCL, and benchmark these on current and historical revisions of each SYCL implementation, revealing the rate of change of performance improvements over time. The data indicates that most SYCL implementations are now quite mature, showing rapid performance improvements in the past, slowing to more modest performance improvements more recently. We also compare the most recent SYCL performance to existing well established frameworks, such as OpenCL and OpenMP.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74429020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs 将SU3_Bench微基准测试移植到Intel Arria 10和Xilinx Alveo U280 fpga上的经验
International Workshop on OpenCL Pub Date : 2021-04-27 DOI: 10.1145/3456669.3456671
D. Doerfler, Farzad Fatollahi-Fard, Colin MacLean, T. Nguyen, Samuel Williams, N. Wright, Marco Siracusa
{"title":"Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs","authors":"D. Doerfler, Farzad Fatollahi-Fard, Colin MacLean, T. Nguyen, Samuel Williams, N. Wright, Marco Siracusa","doi":"10.1145/3456669.3456671","DOIUrl":"https://doi.org/10.1145/3456669.3456671","url":null,"abstract":"In this study we investigate the implications of porting a common computational kernel used in high performance computing, which has been optimized for efficient execution on general purpose graphics processing units (GPUs), to a field programmable gate array (FPGA). In particular, we use a benchmark based on a matrix-matrix multiply kernel commonly used in lattice quantum chromodynamics applications. The microbenchmark is based on the OpenCL programming language. We evaluate the performance, and portability, aspects associated for two FPGAs, the Intel Arria 10 and the Xilinx Alveo U280. The purpose of the study is not to compare the two FPGAs, but to evaluate their respective OpenCL toolchains and to evaluate the level of effort needed to port a GPU optimized code to a FPGA, and the effectiveness of the respective toolchains. We did find the toolchains to be relatively easy to use, and it was possible to get correctness with little effort, but there was significant effort needed to get relatively good performance. We found that FPGAs perform best when using single work item kernels, as opposed to the nominal multiple work item NDRange kernel used for CPUs and GPUs. In addition, other source code changes were necessary, and in particular the lack of a local cache in FPGA architectures can require a significant rewrite of the code. The performance achieved with the Intel Arria 10 was 47.6% of its maximum sustained bandwidth, while the Xilinx Alveo U280 achieved 35.2%. GPU architectures have been shown to demonstrate 75% to 90% architectural efficiencies.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76145578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信