{"title":"MLBlocks: FPGA Blocks for Machine Learning Applications","authors":"Seyedramin Rasoulinezhad, D. Boland, P. Leong","doi":"10.1145/3431920.3439479","DOIUrl":null,"url":null,"abstract":"The underlying goal of FPGA architecture research is to devise flexible substrates which implement a wide variety of circuits efficiently. Contemporary FPGA architectures have been optimized to support networking, signal processing and image processing applications through high precision digital signal processing (DSP) blocks. The recent emergence of machine learning has created a new set of demands characterized by: 1) higher computational density and 2) low precision arithmetic requirements. With the goal of exploring this new design space in a methodical manner, we first propose a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations, which covers many basic linear algebra primitives and standard deep neural network (DNN) layers. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then proposed together with a family of new compute units, called MLBlocks. These blocks are flexible mesh-based systolic array units parameterized with different data movements, data reuse, and multi-precision support. They utilize a columnar arrangement which is compatible with existing FPGA architectures. Finally, using synthetic benchmarks, we demonstrate that MLBlocks offer significantly improved performance over the commercial Xilinx DSP48E2, while maintaining similar area and timing requirements to current DSPs.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431920.3439479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The underlying goal of FPGA architecture research is to devise flexible substrates which implement a wide variety of circuits efficiently. Contemporary FPGA architectures have been optimized to support networking, signal processing and image processing applications through high precision digital signal processing (DSP) blocks. The recent emergence of machine learning has created a new set of demands characterized by: 1) higher computational density and 2) low precision arithmetic requirements. With the goal of exploring this new design space in a methodical manner, we first propose a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations, which covers many basic linear algebra primitives and standard deep neural network (DNN) layers. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then proposed together with a family of new compute units, called MLBlocks. These blocks are flexible mesh-based systolic array units parameterized with different data movements, data reuse, and multi-precision support. They utilize a columnar arrangement which is compatible with existing FPGA architectures. Finally, using synthetic benchmarks, we demonstrate that MLBlocks offer significantly improved performance over the commercial Xilinx DSP48E2, while maintaining similar area and timing requirements to current DSPs.