Debjyoti Bhattacharjee, A. Chattopadhyay, Ricardo Jack Liwongan
{"title":"Accelerating Binary-Matrix Multiplication on FPGA","authors":"Debjyoti Bhattacharjee, A. Chattopadhyay, Ricardo Jack Liwongan","doi":"10.1109/SOCC46988.2019.1570544215","DOIUrl":null,"url":null,"abstract":"Matrix multiplication is required for a wide variety of applications, including data mining, linear algebra, graph transformations, etc. Most of the existing works to accelerate matrix multiplication have focused on matrices with floating point elements. In this work, we propose for the first time an FPGA based accelerator architecture for binary matrix multiplication. It consists of processing elements laid out in regular tiled manner. The communication structure used is a torus. We undertook detailed experimental study of the proposed architecture. The architecture shows excellent scalability with increase in number of processing elements, with minimal drop in operating frequency. The proposed system achieves maximum throughput of 1120 Gops for $4 \\times 4$ network size with $2048 \\times 2048$ matrix size. The performance achieved by the system is considerably higher than existing works of floating point multiplication on FPGAs, due to optimized PE design for binary matrix multiplication.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCC46988.2019.1570544215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Matrix multiplication is required for a wide variety of applications, including data mining, linear algebra, graph transformations, etc. Most of the existing works to accelerate matrix multiplication have focused on matrices with floating point elements. In this work, we propose for the first time an FPGA based accelerator architecture for binary matrix multiplication. It consists of processing elements laid out in regular tiled manner. The communication structure used is a torus. We undertook detailed experimental study of the proposed architecture. The architecture shows excellent scalability with increase in number of processing elements, with minimal drop in operating frequency. The proposed system achieves maximum throughput of 1120 Gops for $4 \times 4$ network size with $2048 \times 2048$ matrix size. The performance achieved by the system is considerably higher than existing works of floating point multiplication on FPGAs, due to optimized PE design for binary matrix multiplication.