A 1.40mm2 141mW 898GOPS sparse neuromorphic processor in 40nm CMOS

2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits) Pub Date : 2016-06-15 DOI:10.1109/VLSIC.2016.7573526

Phil C. Knag, Chester Liu, Zhengya Zhang

引用次数: 10

Abstract

Sparsity is a brain-inspired property that enables a significant reduction in workload and power dissipation of deep learning. This work presents a 1.40mm2 40nm CMOS sparse neuromorphic processor that implements a two-layer convolutional restricted Boltzmann machine (CRBM) for inference and a support vector machine (SVM) classifier. The processor incorporates sparse convolvers to realize sparsity-proportional workload reduction. The architecture is parallelized along a non-sparse dimension to minimize stalling. At 0.9V and 240MHz, the processor achieves an effective 898.2GOPS performance, dissipating 140.9mW. Using sparsity, we reduce the workload, datapath power consumption and area by 3.4×, 3.3× and 1.74×, respectively. The design uses latch-based memory to reduce area and dynamic clock gating to save power.

查看原文本刊更多论文

40nm CMOS 141mW 898GOPS稀疏神经形态处理器

稀疏性是一种受大脑启发的特性，它可以显著减少深度学习的工作量和功耗。这项工作提出了一个1.40mm2 40nm的CMOS稀疏神经形态处理器，该处理器实现了用于推理的两层卷积受限玻尔兹曼机(CRBM)和支持向量机(SVM)分类器。该处理器采用稀疏卷积来实现稀疏比例的工作量减少。该架构沿着非稀疏维度并行化，以最小化延迟。在0.9V和240MHz下，处理器实现了898.2GOPS的有效性能，功耗为140.9mW。通过使用稀疏性，我们将工作负载、数据路径功耗和面积分别降低了3.4倍、3.3倍和1.74倍。该设计采用基于锁存的存储器来减小内存面积，并采用动态时钟门控来节省功耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits)

自引率

0.00%

发文量