Heterogeneous GPU and FPGA computing: a VexCL case-study

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI:10.1109/IPDPSW55747.2022.00073

Tristan Laan, A. Varbanescu

{"title":"Heterogeneous GPU and FPGA computing: a VexCL case-study","authors":"Tristan Laan, A. Varbanescu","doi":"10.1109/IPDPSW55747.2022.00073","DOIUrl":null,"url":null,"abstract":"FPGA-based accelerators are capturing the interest of the HPC domain, primarily due to their superior energy-efficiency compared to more common accelerators, like GPUs. However, enabling HPC codes to use FPGA-based accelerators (efficiently) remains a difficult task. One interesting, fast-track solution to this problem is to extend the domain-specific, high-level languages, libraries, or APIs that already support other accelerators (e.g., GPUs) to target FPGAs. In this work we demonstrate the added value of such an approach by adding FPGA support to VexCL, a vector expression template library for OpenCL/CUDA. To this end, we use the VexCL-generated OpenCL code as intermediate representation, while creating code-skeletons to implement the FPGA code and all necessary data links between the host and accelerator. We further support five generic optimizations for the FPGA code. We demonstrate our approach on two use-cases, an affine transformation and an SpMV calculation, showcasing the performance and energy consumption of the resulting FPGA versions. We further demonstrate that the FPGA code can outperform the VexCL-generated GPU version. To illustrate the integration of GPU and FPGA code, we also demonstrate the performance of an VexCL SpMV application using a heterogeneous GPU+FPGA system. Our results indicate that, indeed, the integration of the two accelerators is seamless. Performance-wise, however, the heterogeneous version does not outperform the FPGA-only one.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

FPGA-based accelerators are capturing the interest of the HPC domain, primarily due to their superior energy-efficiency compared to more common accelerators, like GPUs. However, enabling HPC codes to use FPGA-based accelerators (efficiently) remains a difficult task. One interesting, fast-track solution to this problem is to extend the domain-specific, high-level languages, libraries, or APIs that already support other accelerators (e.g., GPUs) to target FPGAs. In this work we demonstrate the added value of such an approach by adding FPGA support to VexCL, a vector expression template library for OpenCL/CUDA. To this end, we use the VexCL-generated OpenCL code as intermediate representation, while creating code-skeletons to implement the FPGA code and all necessary data links between the host and accelerator. We further support five generic optimizations for the FPGA code. We demonstrate our approach on two use-cases, an affine transformation and an SpMV calculation, showcasing the performance and energy consumption of the resulting FPGA versions. We further demonstrate that the FPGA code can outperform the VexCL-generated GPU version. To illustrate the integration of GPU and FPGA code, we also demonstrate the performance of an VexCL SpMV application using a heterogeneous GPU+FPGA system. Our results indicate that, indeed, the integration of the two accelerators is seamless. Performance-wise, however, the heterogeneous version does not outperform the FPGA-only one.

查看原文本刊更多论文

异构GPU和FPGA计算:一个VexCL案例研究

基于fpga的加速器正在引起高性能计算领域的兴趣，主要是因为与更常见的加速器(如gpu)相比，它们具有更高的能效。然而，使HPC代码能够(有效地)使用基于fpga的加速器仍然是一项艰巨的任务。这个问题的一个有趣的快速解决方案是将已经支持其他加速器(例如gpu)的特定领域的高级语言、库或api扩展到fpga。在这项工作中，我们通过将FPGA支持添加到VexCL (OpenCL/CUDA的矢量表达式模板库)来展示这种方法的附加价值。为此，我们使用vec生成的OpenCL代码作为中间表示，同时创建代码骨架来实现FPGA代码以及主机和加速器之间的所有必要数据链接。我们进一步支持FPGA代码的五种通用优化。我们在两个用例中展示了我们的方法，一个仿射变换和一个SpMV计算，展示了最终FPGA版本的性能和能耗。我们进一步证明了FPGA代码可以优于vec生成的GPU版本。为了说明GPU和FPGA代码的集成，我们还演示了使用异构GPU+FPGA系统的VexCL SpMV应用程序的性能。我们的结果表明，事实上，两个加速器的集成是无缝的。然而，在性能方面，异构版本并不优于纯fpga版本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量