MLBlocks: FPGA Blocks for Machine Learning Applications

Seyedramin Rasoulinezhad, D. Boland, P. Leong
{"title":"MLBlocks: FPGA Blocks for Machine Learning Applications","authors":"Seyedramin Rasoulinezhad, D. Boland, P. Leong","doi":"10.1145/3431920.3439479","DOIUrl":null,"url":null,"abstract":"The underlying goal of FPGA architecture research is to devise flexible substrates which implement a wide variety of circuits efficiently. Contemporary FPGA architectures have been optimized to support networking, signal processing and image processing applications through high precision digital signal processing (DSP) blocks. The recent emergence of machine learning has created a new set of demands characterized by: 1) higher computational density and 2) low precision arithmetic requirements. With the goal of exploring this new design space in a methodical manner, we first propose a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations, which covers many basic linear algebra primitives and standard deep neural network (DNN) layers. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then proposed together with a family of new compute units, called MLBlocks. These blocks are flexible mesh-based systolic array units parameterized with different data movements, data reuse, and multi-precision support. They utilize a columnar arrangement which is compatible with existing FPGA architectures. Finally, using synthetic benchmarks, we demonstrate that MLBlocks offer significantly improved performance over the commercial Xilinx DSP48E2, while maintaining similar area and timing requirements to current DSPs.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431920.3439479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The underlying goal of FPGA architecture research is to devise flexible substrates which implement a wide variety of circuits efficiently. Contemporary FPGA architectures have been optimized to support networking, signal processing and image processing applications through high precision digital signal processing (DSP) blocks. The recent emergence of machine learning has created a new set of demands characterized by: 1) higher computational density and 2) low precision arithmetic requirements. With the goal of exploring this new design space in a methodical manner, we first propose a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations, which covers many basic linear algebra primitives and standard deep neural network (DNN) layers. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then proposed together with a family of new compute units, called MLBlocks. These blocks are flexible mesh-based systolic array units parameterized with different data movements, data reuse, and multi-precision support. They utilize a columnar arrangement which is compatible with existing FPGA architectures. Finally, using synthetic benchmarks, we demonstrate that MLBlocks offer significantly improved performance over the commercial Xilinx DSP48E2, while maintaining similar area and timing requirements to current DSPs.
MLBlocks:机器学习应用的FPGA模块
FPGA架构研究的根本目标是设计灵活的衬底,以有效地实现各种电路。当代FPGA架构已经过优化,通过高精度数字信号处理(DSP)块支持网络、信号处理和图像处理应用。最近出现的机器学习产生了一系列新的需求,其特点是:1)更高的计算密度和2)低精度的算术要求。为了以系统的方式探索这一新的设计空间,我们首先提出了一个涉及计算多重累积(MAC)操作上的嵌套循环的问题公式,它涵盖了许多基本的线性代数原语和标准深度神经网络(DNN)层。然后提出了一种定量方法,用于从基准测试中获得高效的粗粒度计算块体系结构,以及一系列称为mlblock的新计算单元。这些块是灵活的基于网格的收缩阵列单元,参数化了不同的数据移动、数据重用和多精度支持。它们采用与现有FPGA架构兼容的柱状排列。最后,使用合成基准测试,我们证明mlblock比商用Xilinx DSP48E2提供了显着改进的性能,同时保持了与当前dsp相似的面积和时序要求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信