ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2016-06-01 DOI:10.1145/3007787.3001139

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, R. Balasubramonian, J. Strachan, Miao Hu, R. S. Williams, Vivek Srikumar

{"title":"ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars","authors":"Ali Shafiee, Anirban Nag, Naveen Muralimanohar, R. Balasubramonian, J. Strachan, Miao Hu, R. S. Williams, Vivek Srikumar","doi":"10.1145/3007787.3001139","DOIUrl":null,"url":null,"abstract":"A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (dot-product) operations. A recent project, DaDianNao, adopts a near data processing approach, where a specialized neural functional unit performs all the digital arithmetic operations and receives input weights from adjacent eDRAM banks. This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner. While the use of crossbar memory as an analog dot-product engine is well known, no prior work has designed or characterized a full-fledged accelerator based on crossbars. In particular, our work makes the following contributions: (i) We design a pipelined architecture, with some crossbars dedicated for each neural network layer, and eDRAM buffers that aggregate data between pipeline stages. (ii) We define new data encoding techniques that are amenable to analog computations and that can reduce the high overheads of analog-to-digital conversion (ADC). (iii) We define the many supporting digital components required in an analog CNN accelerator and carry out a design space exploration to identify the best balance of memristor storage/compute, ADCs, and eDRAM storage on a chip. On a suite of CNN and DNN workloads, the proposed ISAAC architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density (respectively), relative to the state-of-the-art DaDianNao architecture.","PeriodicalId":6634,"journal":{"name":"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","volume":"89 1","pages":"14-26"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1416","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3007787.3001139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1416

Abstract

A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (dot-product) operations. A recent project, DaDianNao, adopts a near data processing approach, where a specialized neural functional unit performs all the digital arithmetic operations and receives input weights from adjacent eDRAM banks. This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner. While the use of crossbar memory as an analog dot-product engine is well known, no prior work has designed or characterized a full-fledged accelerator based on crossbars. In particular, our work makes the following contributions: (i) We design a pipelined architecture, with some crossbars dedicated for each neural network layer, and eDRAM buffers that aggregate data between pipeline stages. (ii) We define new data encoding techniques that are amenable to analog computations and that can reduce the high overheads of analog-to-digital conversion (ADC). (iii) We define the many supporting digital components required in an analog CNN accelerator and carry out a design space exploration to identify the best balance of memristor storage/compute, ADCs, and eDRAM storage on a chip. On a suite of CNN and DNN workloads, the proposed ISAAC architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density (respectively), relative to the state-of-the-art DaDianNao architecture.

查看原文本刊更多论文

基于横杆原位模拟算法的卷积神经网络加速器

最近的一些努力试图为流行的机器学习算法设计加速器，例如涉及卷积和深度神经网络(cnn和dnn)的算法。这些算法通常涉及大量的乘-累积(点积)操作。最近的一个项目“大电脑”采用了一种近数据处理方法，其中一个专门的神经功能单元执行所有的数字算术运算，并从相邻的eDRAM库接收输入权重。这项工作探索了一种原位处理方法，其中忆阻器横条阵列不仅存储输入权重，而且还用于以模拟方式执行点积运算。虽然交叉棒存储器作为模拟点积引擎的使用是众所周知的，但没有先前的工作设计或表征了基于交叉棒的成熟加速器。特别是，我们的工作做出了以下贡献:(i)我们设计了一个流水线架构，每个神经网络层都有一些专用的交叉条，以及在流水线阶段之间聚合数据的eDRAM缓冲区。(ii)我们定义了新的数据编码技术，这些技术适用于模拟计算，并且可以降低模数转换(ADC)的高开销。(iii)我们定义了模拟CNN加速器所需的许多支持数字组件，并进行了设计空间探索，以确定芯片上记忆电阻存储/计算、adc和eDRAM存储的最佳平衡。在一套CNN和DNN工作负载上，与最先进的DaDianNao架构相比，提出的ISAAC架构在吞吐量、能量和计算密度方面分别提高了14.8倍、5.5倍和7.5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量