面向高性能计算的FPGA节能代数核

J. Comput. Sci. Technol. Pub Date : 2021-10-21 DOI:10.24215/16666038.21.e09

Federico Favaro, Ernesto Dufrechu, P. Ezzatti, J. Oliver

{"title":"面向高性能计算的FPGA节能代数核","authors":"Federico Favaro, Ernesto Dufrechu, P. Ezzatti, J. Oliver","doi":"10.24215/16666038.21.e09","DOIUrl":null,"url":null,"abstract":"The dissemination of multi-core architectures and the later irruption of massively parallel devices, led to a revolution in High-Performance Computing (HPC) platforms in the last decades. As a result, Field-Programmable Gate Arrays (FPGAs) are re-emerging as a versatile and more energy-efficient alternative to other platforms. Traditional FPGA design implies using low-level Hardware Description Languages (HDL) such as VHDL or Verilog, which follow an entirely different programming model than standard software languages, and their use requires specialized knowledge of the underlying hardware. In the last years, manufacturers started to make big efforts to provide High-Level Synthesis (HLS) tools, in order to allow a grater adoption of FPGAs in the HPC community.Our work studies the use of multi-core hardware and different FPGAs to address Numerical Linear Algebra (NLA) kernels such as the general matrix multiplication GEMM and the sparse matrix-vector multiplication SpMV. Specifically, we compare the behavior of fine-tuned kernels in a multi-core CPU processor and HLS implementations on FPGAs. We perform the experimental evaluation of our implementations on a low-end and a cutting-edge FPGA platform, in terms of runtime and energy consumption, and compare the results against the Intel MKL library in CPU. \n ","PeriodicalId":188846,"journal":{"name":"J. Comput. Sci. Technol.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Energy-efficient algebra kernels in FPGA for High Performance Computing\",\"authors\":\"Federico Favaro, Ernesto Dufrechu, P. Ezzatti, J. Oliver\",\"doi\":\"10.24215/16666038.21.e09\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The dissemination of multi-core architectures and the later irruption of massively parallel devices, led to a revolution in High-Performance Computing (HPC) platforms in the last decades. As a result, Field-Programmable Gate Arrays (FPGAs) are re-emerging as a versatile and more energy-efficient alternative to other platforms. Traditional FPGA design implies using low-level Hardware Description Languages (HDL) such as VHDL or Verilog, which follow an entirely different programming model than standard software languages, and their use requires specialized knowledge of the underlying hardware. In the last years, manufacturers started to make big efforts to provide High-Level Synthesis (HLS) tools, in order to allow a grater adoption of FPGAs in the HPC community.Our work studies the use of multi-core hardware and different FPGAs to address Numerical Linear Algebra (NLA) kernels such as the general matrix multiplication GEMM and the sparse matrix-vector multiplication SpMV. Specifically, we compare the behavior of fine-tuned kernels in a multi-core CPU processor and HLS implementations on FPGAs. We perform the experimental evaluation of our implementations on a low-end and a cutting-edge FPGA platform, in terms of runtime and energy consumption, and compare the results against the Intel MKL library in CPU. \\n \",\"PeriodicalId\":188846,\"journal\":{\"name\":\"J. Comput. Sci. Technol.\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Comput. Sci. Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24215/16666038.21.e09\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Comput. Sci. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24215/16666038.21.e09","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在过去的几十年里，多核架构的传播和后来大规模并行设备的出现导致了高性能计算(HPC)平台的革命。因此，现场可编程门阵列(fpga)作为其他平台的多功能和更节能的替代品重新出现。传统的FPGA设计意味着使用低级硬件描述语言(HDL)，如VHDL或Verilog，它们遵循与标准软件语言完全不同的编程模型，并且它们的使用需要底层硬件的专业知识。在过去的几年里，制造商开始努力提供高级合成(HLS)工具，以便在高性能计算社区中更大程度地采用fpga。我们的工作研究了使用多核硬件和不同的fpga来解决数值线性代数(NLA)内核，如一般矩阵乘法GEMM和稀疏矩阵向量乘法SpMV。具体来说，我们比较了多核CPU处理器和fpga上HLS实现的微调内核的行为。我们在低端和尖端FPGA平台上对我们的实现进行了实验评估，在运行时间和能耗方面，并将结果与CPU中的英特尔MKL库进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Energy-efficient algebra kernels in FPGA for High Performance Computing

The dissemination of multi-core architectures and the later irruption of massively parallel devices, led to a revolution in High-Performance Computing (HPC) platforms in the last decades. As a result, Field-Programmable Gate Arrays (FPGAs) are re-emerging as a versatile and more energy-efficient alternative to other platforms. Traditional FPGA design implies using low-level Hardware Description Languages (HDL) such as VHDL or Verilog, which follow an entirely different programming model than standard software languages, and their use requires specialized knowledge of the underlying hardware. In the last years, manufacturers started to make big efforts to provide High-Level Synthesis (HLS) tools, in order to allow a grater adoption of FPGAs in the HPC community.Our work studies the use of multi-core hardware and different FPGAs to address Numerical Linear Algebra (NLA) kernels such as the general matrix multiplication GEMM and the sparse matrix-vector multiplication SpMV. Specifically, we compare the behavior of fine-tuned kernels in a multi-core CPU processor and HLS implementations on FPGAs. We perform the experimental evaluation of our implementations on a low-end and a cutting-edge FPGA platform, in terms of runtime and energy consumption, and compare the results against the Intel MKL library in CPU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Comput. Sci. Technol.

自引率

0.00%

发文量