{"title":"An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence","authors":"Rui Xiao;Wenyu Jiang;Piew Yoong Chee","doi":"10.1109/JXCDC.2022.3206879","DOIUrl":null,"url":null,"abstract":"The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves \n<inline-formula> <tex-math>$18.4\\times $ </tex-math></inline-formula>\n in energy with 0.136 pJ/MAC efficiency, and \n<inline-formula> <tex-math>$19.9\\times $ </tex-math></inline-formula>\n area for 1T1R case and \n<inline-formula> <tex-math>$15.9\\times $ </tex-math></inline-formula>\n for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over \n<inline-formula> <tex-math>$16\\times $ </tex-math></inline-formula>\n area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"111-118"},"PeriodicalIF":2.0000,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893208.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9893208/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 1
Abstract
The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves
$18.4\times $
in energy with 0.136 pJ/MAC efficiency, and
$19.9\times $
area for 1T1R case and
$15.9\times $
for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over
$16\times $
area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.