A-DSCNN: Depthwise Separable Convolutional Neural Network Inference Chip Design Using an Approximate Multiplier

2015 IEEE Hot Chips 27 Symposium (HCS) Pub Date : 2023-07-19 DOI:10.3390/chips2030010

Jin-Jia Shang, Nicholas Phipps, I-Chyn Wey, T. Teo

引用次数: 0

Abstract

For Convolutional Neural Networks (CNNs), Depthwise Separable CNN (DSCNN) is the preferred architecture for Application Specific Integrated Circuit (ASIC) implementation on edge devices. It benefits from a multi-mode approximate multiplier proposed in this work. The proposed approximate multiplier uses two 4-bit multiplication operations to implement a 12-bit multiplication operation by reusing the same multiplier array. With this approximate multiplier, sequential multiplication operations are pipelined in a modified DSCNN to fully utilize the Processing Element (PE) array in the convolutional layer. Two versions of Approximate-DSCNN (A-DSCNN) accelerators were implemented on TSMC 40 nm CMOS process with a supply voltage of 0.9 V. At a clock frequency of 200 MHz, the designs achieve 4.78 GOPs/mW and 4.89 GOP/mW power efficiency while occupying 1.16 mm2 and 0.398 mm2 area, respectively.

查看原文本刊更多论文

基于近似乘法器的深度可分离卷积神经网络推理芯片设计

对于卷积神经网络(CNN)，深度可分离CNN (DSCNN)是在边缘设备上实现专用集成电路(ASIC)的首选架构。它得益于本工作中提出的多模近似乘法器。所建议的近似乘法器通过重用相同的乘法器数组，使用两个4位乘法操作来实现一个12位乘法操作。利用这个近似乘法器，顺序乘法运算在一个改进的DSCNN中被流水线化，以充分利用卷积层中的处理元素(PE)数组。两个版本的Approximate-DSCNN (a - dscnn)加速器在TSMC 40 nm CMOS工艺上实现，电源电压为0.9 V。在时钟频率为200mhz时，功耗效率分别为4.78 GOPs/mW和4.89 GOP/mW，功耗面积分别为1.16 mm2和0.398 mm2。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE Hot Chips 27 Symposium (HCS)

自引率

0.00%

发文量