流水线ShiftAddNet：一种基于fpga的低硬件消耗CNN实现

IF 1.6 3区工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

International Journal of Circuit Theory and Applications Pub Date : 2025-01-02 DOI:10.1002/cta.4419

Wei-Pau Kiat, Wai Kong Lee, Hung-Khoon Tan, Hui-Fuang Ng

{"title":"流水线ShiftAddNet：一种基于fpga的低硬件消耗CNN实现","authors":"Wei-Pau Kiat, Wai Kong Lee, Hung-Khoon Tan, Hui-Fuang Ng","doi":"10.1002/cta.4419","DOIUrl":null,"url":null,"abstract":"<div>\n \n ShiftAddNet is a recently proposed multiplier-less CNN that replaces conventional multiplication with cheaper shift and add operations, which makes it suitable for hardware implementation. In this paper, we present the first implementation of ShiftAddNet FPGA inference core, which achieves low area consumption and fast computation. ShiftAddNet combined the convolutional layer of the DeepShift-PS (denoted as Shift-Accumulate, sac) and AdderNet (denoted as Add-Accumulate, aac) into a single computational stage. Due to this reason, there are data dependencies between the sac and aac, which prohibits them from being executed in parallel, resulting in \n<math>\n <mn>2</mn>\n <mo>×</mo></math> more operations compared to other multiplier-less CNNs like DeepShift-PS and AdderNet. To overcome this performance bottleneck, we proposed a novel technique to allow pipeline processing between sac and aac, effectively reducing the latency. The proposed ShiftAddNet-18 was evaluated on a small ResNet-18, achieving 11.37 ms of latency per image, which is \n<math>\n <mo>∼</mo></math>69.21% faster than the original version that takes 19.24 ms. On a denser network, the proposed pipeline ShiftAddNet-101 requires only 61.92 ms as compared to the original version of 98.85 ms, showing a latency reduction of \n<math>\n <mo>∼</mo></math>37.1%. Compared to the state-of-the-art multiplier-less CNN core (e.g., AdderNet), our work is 20% slower in latency but provides higher accuracy and consumes \n<math>\n <mn>2.2</mn>\n <mo>×</mo></math> less DSP.\n </div>","PeriodicalId":13874,"journal":{"name":"International Journal of Circuit Theory and Applications","volume":"53 9","pages":"5538-5547"},"PeriodicalIF":1.6000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pipeline ShiftAddNet: An FPGA-Based CNN Implementation With Low Hardware Consumption Targeting Constrained Devices\",\"authors\":\"Wei-Pau Kiat, Wai Kong Lee, Hung-Khoon Tan, Hui-Fuang Ng\",\"doi\":\"10.1002/cta.4419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n ShiftAddNet is a recently proposed multiplier-less CNN that replaces conventional multiplication with cheaper shift and add operations, which makes it suitable for hardware implementation. In this paper, we present the first implementation of ShiftAddNet FPGA inference core, which achieves low area consumption and fast computation. ShiftAddNet combined the convolutional layer of the DeepShift-PS (denoted as Shift-Accumulate, sac) and AdderNet (denoted as Add-Accumulate, aac) into a single computational stage. Due to this reason, there are data dependencies between the sac and aac, which prohibits them from being executed in parallel, resulting in \\n<math>\\n <mn>2</mn>\\n <mo>×</mo></math> more operations compared to other multiplier-less CNNs like DeepShift-PS and AdderNet. To overcome this performance bottleneck, we proposed a novel technique to allow pipeline processing between sac and aac, effectively reducing the latency. The proposed ShiftAddNet-18 was evaluated on a small ResNet-18, achieving 11.37 ms of latency per image, which is \\n<math>\\n <mo>∼</mo></math>69.21% faster than the original version that takes 19.24 ms. On a denser network, the proposed pipeline ShiftAddNet-101 requires only 61.92 ms as compared to the original version of 98.85 ms, showing a latency reduction of \\n<math>\\n <mo>∼</mo></math>37.1%. Compared to the state-of-the-art multiplier-less CNN core (e.g., AdderNet), our work is 20% slower in latency but provides higher accuracy and consumes \\n<math>\\n <mn>2.2</mn>\\n <mo>×</mo></math> less DSP.\\n </div>\",\"PeriodicalId\":13874,\"journal\":{\"name\":\"International Journal of Circuit Theory and Applications\",\"volume\":\"53 9\",\"pages\":\"5538-5547\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Circuit Theory and Applications\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cta.4419\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Circuit Theory and Applications","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cta.4419","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

ShiftAddNet是最近提出的一种无乘法器的CNN，它用更便宜的移位和加法操作取代了传统的乘法，这使得它适合硬件实现。本文首次实现了ShiftAddNet FPGA推理核，实现了低面积消耗和快速计算。ShiftAddNet将DeepShift-PS（表示为Shift-Accumulate， sac）和AdderNet（表示为Add-Accumulate， aac）的卷积层合并为一个计算阶段。由于这个原因，sac和aac之间存在数据依赖关系，这使得它们无法并行执行，导致与其他无乘数cnn（如DeepShift-PS和AdderNet）相比，运算量增加了2倍。为了克服这一性能瓶颈，我们提出了一种新的技术，允许在sac和aac之间进行管道处理，有效地减少了延迟。提出的ShiftAddNet-18在小型ResNet-18上进行了评估，实现了每张图像11.37 ms的延迟，比原始版本的19.24 ms快了69.21%。在更密集的网络中，拟议的ShiftAddNet-101管道只需要61.92 ms，而原始版本需要98.85 ms，延迟降低了约37.1%。与最先进的无乘数CNN核心（例如AdderNet）相比，我们的工作延迟降低了20%，但提供更高的精度，并且消耗的DSP减少了2.2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pipeline ShiftAddNet: An FPGA-Based CNN Implementation With Low Hardware Consumption Targeting Constrained Devices

ShiftAddNet is a recently proposed multiplier-less CNN that replaces conventional multiplication with cheaper shift and add operations, which makes it suitable for hardware implementation. In this paper, we present the first implementation of ShiftAddNet FPGA inference core, which achieves low area consumption and fast computation. ShiftAddNet combined the convolutional layer of the DeepShift-PS (denoted as Shift-Accumulate, sac) and AdderNet (denoted as Add-Accumulate, aac) into a single computational stage. Due to this reason, there are data dependencies between the sac and aac, which prohibits them from being executed in parallel, resulting in $2 \times$ more operations compared to other multiplier-less CNNs like DeepShift-PS and AdderNet. To overcome this performance bottleneck, we proposed a novel technique to allow pipeline processing between sac and aac, effectively reducing the latency. The proposed ShiftAddNet-18 was evaluated on a small ResNet-18, achieving 11.37 ms of latency per image, which is $\sim$ 69.21% faster than the original version that takes 19.24 ms. On a denser network, the proposed pipeline ShiftAddNet-101 requires only 61.92 ms as compared to the original version of 98.85 ms, showing a latency reduction of $\sim$ 37.1%. Compared to the state-of-the-art multiplier-less CNN core (e.g., AdderNet), our work is 20% slower in latency but provides higher accuracy and consumes $2.2 \times$ less DSP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Circuit Theory and Applications 工程技术-工程：电子与电气

CiteScore

3.60

自引率

34.80%

发文量

277

审稿时长

4.5 months

期刊介绍： The scope of the Journal comprises all aspects of the theory and design of analog and digital circuits together with the application of the ideas and techniques of circuit theory in other fields of science and engineering. Examples of the areas covered include: Fundamental Circuit Theory together with its mathematical and computational aspects; Circuit modeling of devices; Synthesis and design of filters and active circuits; Neural networks; Nonlinear and chaotic circuits; Signal processing and VLSI; Distributed, switched and digital circuits; Power electronics; Solid state devices. Contributions to CAD and simulation are welcome.