Jaume Bosch, Antonio Filgueras, Miquel Vidal Piñol, Daniel Jiménez-González, C. Álvarez, X. Martorell
{"title":"Exploiting Parallelism on GPUs and FPGAs with OmpSs","authors":"Jaume Bosch, Antonio Filgueras, Miquel Vidal Piñol, Daniel Jiménez-González, C. Álvarez, X. Martorell","doi":"10.1145/3152821.3152880","DOIUrl":null,"url":null,"abstract":"This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives specifying task-based parallelism. The Mercurium compiler transforms the code to exploit the parallelism in the SMP host cores, and also to spawn work on CUDA/OpenCL devices, and FPGA accelerators. For the CUDA/OpenCL devices, the programmer needs only to insert the annotations and provide the kernel function to be compiled by the native CUDA/OpenCL compiler. In the case of the FPGAs, OmpSs uses the High-Level Synthesis tools from FPGA vendors to generate the IP configurations for the FPGA. In this paper we present the performance obtained on the matrix multiply benchmark in the Xilinx Zynq Ultrascale+, as a result of using OmpSs on this benchmark.","PeriodicalId":227417,"journal":{"name":"ANDARE '17","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ANDARE '17","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3152821.3152880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives specifying task-based parallelism. The Mercurium compiler transforms the code to exploit the parallelism in the SMP host cores, and also to spawn work on CUDA/OpenCL devices, and FPGA accelerators. For the CUDA/OpenCL devices, the programmer needs only to insert the annotations and provide the kernel function to be compiled by the native CUDA/OpenCL compiler. In the case of the FPGAs, OmpSs uses the High-Level Synthesis tools from FPGA vendors to generate the IP configurations for the FPGA. In this paper we present the performance obtained on the matrix multiply benchmark in the Xilinx Zynq Ultrascale+, as a result of using OmpSs on this benchmark.