{"title":"Heterogeneous GPU and FPGA computing: a VexCL case-study","authors":"Tristan Laan, A. Varbanescu","doi":"10.1109/IPDPSW55747.2022.00073","DOIUrl":null,"url":null,"abstract":"FPGA-based accelerators are capturing the interest of the HPC domain, primarily due to their superior energy-efficiency compared to more common accelerators, like GPUs. However, enabling HPC codes to use FPGA-based accelerators (efficiently) remains a difficult task. One interesting, fast-track solution to this problem is to extend the domain-specific, high-level languages, libraries, or APIs that already support other accelerators (e.g., GPUs) to target FPGAs. In this work we demonstrate the added value of such an approach by adding FPGA support to VexCL, a vector expression template library for OpenCL/CUDA. To this end, we use the VexCL-generated OpenCL code as intermediate representation, while creating code-skeletons to implement the FPGA code and all necessary data links between the host and accelerator. We further support five generic optimizations for the FPGA code. We demonstrate our approach on two use-cases, an affine transformation and an SpMV calculation, showcasing the performance and energy consumption of the resulting FPGA versions. We further demonstrate that the FPGA code can outperform the VexCL-generated GPU version. To illustrate the integration of GPU and FPGA code, we also demonstrate the performance of an VexCL SpMV application using a heterogeneous GPU+FPGA system. Our results indicate that, indeed, the integration of the two accelerators is seamless. Performance-wise, however, the heterogeneous version does not outperform the FPGA-only one.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
FPGA-based accelerators are capturing the interest of the HPC domain, primarily due to their superior energy-efficiency compared to more common accelerators, like GPUs. However, enabling HPC codes to use FPGA-based accelerators (efficiently) remains a difficult task. One interesting, fast-track solution to this problem is to extend the domain-specific, high-level languages, libraries, or APIs that already support other accelerators (e.g., GPUs) to target FPGAs. In this work we demonstrate the added value of such an approach by adding FPGA support to VexCL, a vector expression template library for OpenCL/CUDA. To this end, we use the VexCL-generated OpenCL code as intermediate representation, while creating code-skeletons to implement the FPGA code and all necessary data links between the host and accelerator. We further support five generic optimizations for the FPGA code. We demonstrate our approach on two use-cases, an affine transformation and an SpMV calculation, showcasing the performance and energy consumption of the resulting FPGA versions. We further demonstrate that the FPGA code can outperform the VexCL-generated GPU version. To illustrate the integration of GPU and FPGA code, we also demonstrate the performance of an VexCL SpMV application using a heterogeneous GPU+FPGA system. Our results indicate that, indeed, the integration of the two accelerators is seamless. Performance-wise, however, the heterogeneous version does not outperform the FPGA-only one.