QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS

2018 IEEE International Solid - State Circuits Conference - (ISSCC) Pub Date : 2018-03-08 DOI:10.1109/ISSCC.2018.8310261

Kodai Ueyoshi, Kota Ando, Kazutoshi Hirose, Shinya Takamaeda-Yamazaki, J. Kadomoto, T. Miyata, M. Hamada, T. Kuroda, M. Motomura

引用次数: 62

Abstract

A key consideration for deep neural network (DNN) inference accelerators is the need for large and high-bandwidth external memories. Although an architectural concept for stacking a DNN accelerator with DRAMs has been proposed previously, long DRAM latency remains problematic and limits the performance [1]. Recent algorithm-level optimizations, such as network pruning and compression, have shown success in reducing the DNN memory size [2]; however, since networks become irregular and sparse, they induce an additional need for agile random accesses to the memory systems.

查看原文本刊更多论文

QUEST:一个7.49TOPS多用途对数量化DNN推理引擎，采用40nm CMOS电感耦合技术，堆叠在96MB 3D SRAM上

深度神经网络(DNN)推理加速器的一个关键考虑因素是需要大带宽的外部存储器。虽然之前已经提出了将DNN加速器与DRAM堆叠的架构概念，但长DRAM延迟仍然是问题并限制了性能[1]。最近的算法级优化，如网络修剪和压缩，已经成功地减少了DNN内存大小[2];然而，由于网络变得不规则和稀疏，它们引发了对灵活随机访问存储系统的额外需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Solid - State Circuits Conference - (ISSCC)

自引率

0.00%

发文量