基于FPGA的可定制高性能矩阵乘法内核(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI:10.1145/2684746.2689147

Jie Wang, J. Cong

{"title":"基于FPGA的可定制高性能矩阵乘法内核(仅摘要)","authors":"Jie Wang, J. Cong","doi":"10.1145/2684746.2689147","DOIUrl":null,"url":null,"abstract":"Matrix multiplication (MM) is an important kernel in many application domains, including scientific computing, image processing, machine learning, etc. Numerous accelerator designs have been proposed for higher throughput and energy efficiency. In this paper we present a customizable FPGA accelerator of matrix multiplication. We also develop a design automation flow to generate the optimal design configuration with the highest throughput given the matrix size and target FPGA platform. It can be integrated with HLS tools as a basic parameterizable library component. Experiments show that for 512×512 single precision MM, we can achieve as high as 358 GFLOPs on the Xilinx Virtix-7 XC7VX485T-2, which outperforms any published state-of-the-art FPGA accelerator design by at least 28.3%.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Customizable and High Performance Matrix Multiplication Kernel on FPGA (Abstract Only)\",\"authors\":\"Jie Wang, J. Cong\",\"doi\":\"10.1145/2684746.2689147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Matrix multiplication (MM) is an important kernel in many application domains, including scientific computing, image processing, machine learning, etc. Numerous accelerator designs have been proposed for higher throughput and energy efficiency. In this paper we present a customizable FPGA accelerator of matrix multiplication. We also develop a design automation flow to generate the optimal design configuration with the highest throughput given the matrix size and target FPGA platform. It can be integrated with HLS tools as a basic parameterizable library component. Experiments show that for 512×512 single precision MM, we can achieve as high as 358 GFLOPs on the Xilinx Virtix-7 XC7VX485T-2, which outperforms any published state-of-the-art FPGA accelerator design by at least 28.3%.\",\"PeriodicalId\":388546,\"journal\":{\"name\":\"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2684746.2689147\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2684746.2689147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

矩阵乘法是科学计算、图像处理、机器学习等许多应用领域的重要核心。许多加速器设计已经提出了更高的吞吐量和能源效率。本文提出了一种可定制的FPGA矩阵乘法加速器。我们还开发了一个设计自动化流程，以生成具有最高吞吐量的最佳设计配置，给定矩阵大小和目标FPGA平台。它可以作为基本的可参数化库组件与HLS工具集成。实验表明，对于512×512单精度MM，我们可以在Xilinx Virtix-7 XC7VX485T-2上实现高达358 GFLOPs，比任何已发布的最先进的FPGA加速器设计至少高出28.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Customizable and High Performance Matrix Multiplication Kernel on FPGA (Abstract Only)

Matrix multiplication (MM) is an important kernel in many application domains, including scientific computing, image processing, machine learning, etc. Numerous accelerator designs have been proposed for higher throughput and energy efficiency. In this paper we present a customizable FPGA accelerator of matrix multiplication. We also develop a design automation flow to generate the optimal design configuration with the highest throughput given the matrix size and target FPGA platform. It can be integrated with HLS tools as a basic parameterizable library component. Experiments show that for 512×512 single precision MM, we can achieve as high as 358 GFLOPs on the Xilinx Virtix-7 XC7VX485T-2, which outperforms any published state-of-the-art FPGA accelerator design by at least 28.3%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量