An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence

IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-09-15 DOI:10.1109/JXCDC.2022.3206879

Rui Xiao;Wenyu Jiang;Piew Yoong Chee

{"title":"An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence","authors":"Rui Xiao;Wenyu Jiang;Piew Yoong Chee","doi":"10.1109/JXCDC.2022.3206879","DOIUrl":null,"url":null,"abstract":"The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves \n<inline-formula> <tex-math>$18.4\\times $ </tex-math></inline-formula>\n in energy with 0.136 pJ/MAC efficiency, and \n<inline-formula> <tex-math>$19.9\\times $ </tex-math></inline-formula>\n area for 1T1R case and \n<inline-formula> <tex-math>$15.9\\times $ </tex-math></inline-formula>\n for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over \n<inline-formula> <tex-math>$16\\times $ </tex-math></inline-formula>\n area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"111-118"},"PeriodicalIF":2.0000,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893208.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9893208/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 1

Abstract

The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves

$18.4\times $

in energy with 0.136 pJ/MAC efficiency, and

$19.9\times $

area for 1T1R case and

$15.9\times $

for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over

$16\times $

area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.

查看原文本刊更多论文

一种面向边缘智能的节能时复用内存计算架构

深度神经网络(dnn)不断增长的数据量和复杂性需要新的架构来超越冯-诺伊曼瓶颈的限制，内存计算(CIM)是实现节能神经网络的一个有前途的方向。然而，CIM的外围传感电路通常是耗电和面积大的组件。我们提出了一种基于忆性模拟计算的时间复用CIM架构(TM-CIM)，以实现外围电路的共享和一次处理一列。忆阻器阵列以列方式排列，避免在未选择的列上浪费功率/能量。此外，数模转换器(DAC)的功率和能源效率(比模数转换器(ADC)的开销更大)可以在TM-CIM中进行微调，以获得显著改进。对于典型设置的256*256横条阵列，TM-CIM以0.136 pJ/MAC效率节省18.4美元的能源，1T1R机箱节省19.9美元的面积，2T2R机箱节省15.9美元的面积。对VGG-16的性能评估表明，TM-CIM可以节省超过16美元的面积。在芯片面积、峰值功率和延迟之间进行了权衡，提出了一种在不显著增加芯片面积和峰值功率的情况下进一步降低VGG-16上延迟的方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊