IMEC: A Memory-Efficient Convolution Algorithm For Quantised Neural Network Accelerators

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI:10.1109/ASAP54787.2022.00027

Eashan Wadhwa, Shashwat Khandelwal, Shanker Shreejith

{"title":"IMEC: A Memory-Efficient Convolution Algorithm For Quantised Neural Network Accelerators","authors":"Eashan Wadhwa, Shashwat Khandelwal, Shanker Shreejith","doi":"10.1109/ASAP54787.2022.00027","DOIUrl":null,"url":null,"abstract":"Quantised convolution neural networks (QCNNs) on FPGAs have shown tremendous potential for deploying deep learning on resource constrained devices closer to the data source or in embedded applications. An essential building block of (Q)CNNs are the convolutional layers. FPGA implementations use modified versions of convolution kernels to reduce the resource overheads using variations of the sliding kernel algorithm. While these alleviate resource consumption to a certain degree, they still incur considerable (distributed) memory resources, requiring the use of larger FPGA devices with sufficient on-chip memory elements to implement deep QCNNs. In this paper, we present the Inverse Memory Efficient Convolution (IMEC) algorithm, a novel strategy to lower the memory consumption of convolutional layers in QCNNs. IMEC lowers the footprint of intermediate matrix buffers incurred within the convolutional layers and the multiply-accumulate (MAC) operators required at each layer through a series of data organisation and computational optimisations. We evaluate IMEC by integrating it into the BNN-PYNQ framework that can compile high-level QCNN representations to the FPGA bitstream. Our results show that IMEC can optimise memory footprint and the overall resource overhead of the convolutional layers by ~33% and ~20% (LUT and FF count) respectively, across multiple quantisation levels (1-bit to 8-bit), while maintaining identical inference accuracy as the state-of-the-art QCNN implementations.","PeriodicalId":207871,"journal":{"name":"2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP54787.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Quantised convolution neural networks (QCNNs) on FPGAs have shown tremendous potential for deploying deep learning on resource constrained devices closer to the data source or in embedded applications. An essential building block of (Q)CNNs are the convolutional layers. FPGA implementations use modified versions of convolution kernels to reduce the resource overheads using variations of the sliding kernel algorithm. While these alleviate resource consumption to a certain degree, they still incur considerable (distributed) memory resources, requiring the use of larger FPGA devices with sufficient on-chip memory elements to implement deep QCNNs. In this paper, we present the Inverse Memory Efficient Convolution (IMEC) algorithm, a novel strategy to lower the memory consumption of convolutional layers in QCNNs. IMEC lowers the footprint of intermediate matrix buffers incurred within the convolutional layers and the multiply-accumulate (MAC) operators required at each layer through a series of data organisation and computational optimisations. We evaluate IMEC by integrating it into the BNN-PYNQ framework that can compile high-level QCNN representations to the FPGA bitstream. Our results show that IMEC can optimise memory footprint and the overall resource overhead of the convolutional layers by ~33% and ~20% (LUT and FF count) respectively, across multiple quantisation levels (1-bit to 8-bit), while maintaining identical inference accuracy as the state-of-the-art QCNN implementations.

查看原文本刊更多论文

一种用于量化神经网络加速器的高效内存卷积算法

fpga上的量化卷积神经网络(QCNNs)已经显示出在接近数据源的资源受限设备或嵌入式应用中部署深度学习的巨大潜力。(Q) cnn的一个基本组成部分是卷积层。FPGA实现使用修改版本的卷积核来减少资源开销，使用滑动核算法的变体。虽然这些在一定程度上减轻了资源消耗，但它们仍然需要大量(分布式)内存资源，需要使用具有足够片上存储元件的更大的FPGA器件来实现深度qcnn。在本文中，我们提出了逆记忆高效卷积(IMEC)算法，这是一种降低qcnn卷积层内存消耗的新策略。通过一系列的数据组织和计算优化，IMEC降低了卷积层中产生的中间矩阵缓冲区的占用，以及每层所需的乘法累积(MAC)运算符。我们通过将IMEC集成到BNN-PYNQ框架中来评估IMEC，该框架可以将高级QCNN表示编译到FPGA位流中。我们的研究结果表明，IMEC可以优化卷积层的内存占用和总体资源开销，在多个量化水平(1位到8位)上分别优化约33%和约20% (LUT和FF计数)，同时保持与最先进的QCNN实现相同的推理精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)

自引率

0.00%

发文量