Accelerating Low Bit-Width Deep Convolution Neural Network in MRAM

2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2018-07-01 DOI:10.1109/ISVLSI.2018.00103

Zhezhi He, Shaahin Angizi, Deliang Fan

引用次数: 8

Abstract

Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as ‘CNN power and memory wall’. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement ‘bit-wise in-memory convolution engine’, which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the ‘in-memory computing’ concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.

查看原文本刊更多论文

MRAM中加速低位宽深度卷积神经网络

深度卷积神经网络(CNN)在大规模数据集的图像识别方面取得了优异的成绩。然而，对更高推理精度的追求导致CNN架构的层次更深、连接更密集，这必然使得其硬件实现需要越来越多的内存和计算资源。这可以理解为“CNN权力和记忆墙”。最近的研究努力通过使用低位宽权重、激活和梯度，在保持相当好的精度的同时，显著降低了模型尺寸和计算复杂度。在这项工作中，我们提出了不同的新兴非易失性磁随机存取存储器(MRAM)设计，可以用来实现“逐位内存卷积引擎”，它可以同时存储网络参数并计算低位宽卷积。这种新的计算模型利用“内存计算”概念来加速CNN推理，并减少卷积能量消耗，这是由于内在的内存逻辑设计和数据通信的减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量