FeatherNet

ACM Transactions on Reconfigurable Technology and Systems (TRETS) Pub Date : 2019-03-28 DOI:10.1145/3306202

Raghid Morcel, Hazem M. Hajj, M. Saghir, Haitham Akkary, H. Artail, R. Khanna, A. Keshavamurthy

{"title":"FeatherNet","authors":"Raghid Morcel, Hazem M. Hajj, M. Saghir, Haitham Akkary, H. Artail, R. Khanna, A. Keshavamurthy","doi":"10.1145/3306202","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Network (ConvNet or CNN) algorithms are characterized by a large number of model parameters and high computational complexity. These two requirements have made it challenging for implementations on resource-limited FPGAs. The challenges are magnified when considering designs for low-end FPGAs. While previous work has demonstrated successful ConvNet implementations with high-end FPGAs, this article presents a ConvNet accelerator design that enables the implementation of complex deep ConvNet architectures on resource-constrained FPGA platforms aimed at the IoT market. We call the design “FeatherNet” for its light resource utilization. The implementations are VHDL-based providing flexibility in design optimizations. As part of the design process, new methods are introduced to address several design challenges. The first method is a novel stride-aware graph-based method targeted at ConvNets that aims at achieving efficient signal processing with reduced resource utilization. The second method addresses the challenge of determining the minimal precision arithmetic needed while preserving high accuracy. For this challenge, we propose variable-width dynamic fixed-point representations combined with a layer-by-layer design-space pruning heuristic across the different layers of the deep ConvNet model. The third method aims at achieving a modular design that can support different types of ConvNet layers while ensuring low resource utilization. For this challenge, we propose the modules to be relatively small and composed of computational filters that can be interconnected to build an entire accelerator design. These model elements can be easily configured through HDL parameters (e.g., layer type, mask size, stride, etc.) to meet the needs of specific ConvNet implementations and thus they can be reused to implement a wide variety of ConvNet architectures. The fourth method addresses the challenge of design portability between two different FPGA vendor platforms, namely, Intel/Altera and Xilinx. For this challenge, we propose to instantiate the device-specific hardware blocks needed in each computational filter, rather than relying on the synthesis tools to infer these blocks, while keeping track of the similarities and differences between the two platforms. We believe that the solutions to these design challenges further advance knowledge as they can benefit designers and other researchers using similar devices or facing similar challenges. Our results demonstrated the success of addressing the design challenges and achieving low (30%) resource utilization for the low-end FPGA platforms: Zedboard and Cyclone V. The design overcame the limitation of designs targeted for high-end platforms and that cannot fit on low-end IoT platforms. Furthermore, our design showed superior performance results (measured in terms of [Frame/s/W] per Dollar) compared to high-end optimized designs.","PeriodicalId":162787,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3306202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Convolutional Neural Network (ConvNet or CNN) algorithms are characterized by a large number of model parameters and high computational complexity. These two requirements have made it challenging for implementations on resource-limited FPGAs. The challenges are magnified when considering designs for low-end FPGAs. While previous work has demonstrated successful ConvNet implementations with high-end FPGAs, this article presents a ConvNet accelerator design that enables the implementation of complex deep ConvNet architectures on resource-constrained FPGA platforms aimed at the IoT market. We call the design “FeatherNet” for its light resource utilization. The implementations are VHDL-based providing flexibility in design optimizations. As part of the design process, new methods are introduced to address several design challenges. The first method is a novel stride-aware graph-based method targeted at ConvNets that aims at achieving efficient signal processing with reduced resource utilization. The second method addresses the challenge of determining the minimal precision arithmetic needed while preserving high accuracy. For this challenge, we propose variable-width dynamic fixed-point representations combined with a layer-by-layer design-space pruning heuristic across the different layers of the deep ConvNet model. The third method aims at achieving a modular design that can support different types of ConvNet layers while ensuring low resource utilization. For this challenge, we propose the modules to be relatively small and composed of computational filters that can be interconnected to build an entire accelerator design. These model elements can be easily configured through HDL parameters (e.g., layer type, mask size, stride, etc.) to meet the needs of specific ConvNet implementations and thus they can be reused to implement a wide variety of ConvNet architectures. The fourth method addresses the challenge of design portability between two different FPGA vendor platforms, namely, Intel/Altera and Xilinx. For this challenge, we propose to instantiate the device-specific hardware blocks needed in each computational filter, rather than relying on the synthesis tools to infer these blocks, while keeping track of the similarities and differences between the two platforms. We believe that the solutions to these design challenges further advance knowledge as they can benefit designers and other researchers using similar devices or facing similar challenges. Our results demonstrated the success of addressing the design challenges and achieving low (30%) resource utilization for the low-end FPGA platforms: Zedboard and Cyclone V. The design overcame the limitation of designs targeted for high-end platforms and that cannot fit on low-end IoT platforms. Furthermore, our design showed superior performance results (measured in terms of [Frame/s/W] per Dollar) compared to high-end optimized designs.

查看原文本刊更多论文

FeatherNet

卷积神经网络(ConvNet或CNN)算法具有模型参数多、计算复杂度高等特点。这两个要求使得在资源有限的fpga上实现具有挑战性。当考虑低端fpga的设计时，挑战就会被放大。虽然之前的工作已经成功地展示了高端FPGA的ConvNet实现，但本文提出了一种ConvNet加速器设计，可以在针对物联网市场的资源受限的FPGA平台上实现复杂的深度ConvNet架构。我们称这个设计为“羽毛网”，因为它利用了光资源。实现是基于vhdl的，提供了设计优化的灵活性。作为设计过程的一部分，引入了新的方法来解决几个设计挑战。第一种方法是一种新颖的基于步距感知图的方法，其目标是在减少资源利用率的情况下实现高效的信号处理。第二种方法解决了在保持高精度的同时确定所需的最小精度算法的挑战。针对这一挑战，我们提出了可变宽度的动态定点表示，并结合了跨深度ConvNet模型不同层的逐层设计空间修剪启发式方法。第三种方法旨在实现模块化设计，在保证低资源利用率的同时支持不同类型的ConvNet层。对于这个挑战，我们建议模块相对较小，由计算滤波器组成，可以相互连接以构建整个加速器设计。这些模型元素可以通过HDL参数(例如，层类型，掩码大小，跨步等)轻松配置，以满足特定ConvNet实现的需求，因此它们可以被重用以实现各种各样的ConvNet架构。第四种方法解决了两种不同FPGA供应商平台(即Intel/Altera和Xilinx)之间设计可移植性的挑战。对于这一挑战，我们建议实例化每个计算过滤器中所需的特定于设备的硬件块，而不是依赖于合成工具来推断这些块，同时跟踪两个平台之间的异同。我们相信，这些设计挑战的解决方案将进一步推动知识的发展，因为它们可以使使用类似设备或面临类似挑战的设计师和其他研究人员受益。我们的研究结果表明，我们成功地解决了低端FPGA平台Zedboard和Cyclone v的设计挑战，并实现了低(30%)的资源利用率。该设计克服了针对高端平台设计的限制，无法适应低端物联网平台。此外，与高端优化设计相比，我们的设计显示出卓越的性能结果(以每美元的[帧/秒/W]来衡量)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

自引率

0.00%

发文量