Edge Inference Engine for Deep & Random Sparse Neural Networks with 4-bit Cartesian-Product MAC Array and Pipelined Activation Aligner

2021 IEEE Hot Chips 33 Symposium (HCS) Pub Date : 2021-08-22 DOI:10.1109/HCS52781.2021.9567328

Kota Ando, Jaehoon Yu, Kazutoshi Hirose, Hiroki Nakahara, Kazushi Kawamura, Thiem Van Chu, M. Motomura

引用次数: 1

Abstract

A 4b-quantized convolutional neural network (CNN) inference engine for edge-AI is presented featuring a Cartesian-product MAC array and pipelined activation aligners targeting deep-/random-pruned models. A 40nm prototype with 32x32 MACs and 5Mb SRAM runs at 534 MHz, 1.07 TOPS, 352 mW at 1.1V, and attains 5.30 dense TOPS/W, 234 MHz at 0.8V. Sparse TOPS/W reaches 26.5 when running a randomly pruned model (after 88% pruning). Training algorithms for obtaining highly efficient sparse/quantized models are also proposed.

查看原文本刊更多论文

基于4位笛卡尔积MAC阵列和流水线激活对齐器的深度随机稀疏神经网络边缘推理引擎

提出了一种用于边缘人工智能的4b量化卷积神经网络(CNN)推理引擎，该引擎采用笛卡尔积MAC阵列和针对深度/随机修剪模型的流水线激活对齐器。具有32x32 mac和5Mb SRAM的40nm原型机在1.1V下运行534 MHz, 1.07 TOPS, 352 mW，并在0.8V下达到5.30 dense TOPS/W, 234 MHz。运行随机剪枝模型时(剪枝88%后)，Sparse TOPS/W达到26.5。本文还提出了获得高效稀疏/量化模型的训练算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE Hot Chips 33 Symposium (HCS)

自引率

0.00%

发文量