{"title":"Enhancing ConvNets With ConvFIFO: A Crossbar PIM Architecture Based on Kernel-Stationary First-In-First-Out Dataflow","authors":"Yu Qian;Liang Zhao;Fanzi Meng;Xiapeng Xu;Cheng Zhuo;Xunzhao Yin","doi":"10.1109/TVLSI.2024.3409648","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (ConvNets) have long been the model of choice for computer vision (CV) problems and gained renewed traction lately. In order to compute ConvNets more efficiently, process-in-memory (PIM) architectures based on emerging non-volatile memories (NVMs) such as RRAM have been widely studied. However, conventional NVM-based PIM suffered from various non-idealities including IR drop, sneak-path currents, large analog-to-digital converter (ADC) overhead, device variations, circuits mismatch, and error propagation. In this work, we propose ConvFIFO, a crossbar-memory-based PIM architecture for ConvNets featuring a kernel-stationary dataflow. Through the design of FIFO-type input and output buffers, smaller row-activation parallelism, and more compact ADCs, ConvFIFO can maximize the reuse rates of inputs and partial sums to achieve a more balanced trade-off among throughput, accuracy, and area/energy consumption. Using SRAM-based FIFO as the input/output buffer, ConvFIFO achieves a systolic architecture without the need to move weight data, bypassing the limitation of NVM endurance and minimizing the movement of partial sums. Moreover, the FIFO nature of the dataflow allows flexible pipeline design and load balancing. Compared to classical NVM-based PIM architectures such as ISAAC, ConvFIFO exhibits significant performance enhancement for various ConvNet models, showing 1.66–\n<inline-formula> <tex-math>$1.69\\times $ </tex-math></inline-formula>\n/1.69–\n<inline-formula> <tex-math>$1.74\\times $ </tex-math></inline-formula>\n/4.23–\n<inline-formula> <tex-math>$4.79\\times $ </tex-math></inline-formula>\n/1.59–\n<inline-formula> <tex-math>$1.74\\times $ </tex-math></inline-formula>\n improvement in terms of energy consumption, latency, Ops/W, and Ops/s\n<inline-formula> <tex-math>$\\times $ </tex-math></inline-formula>\nmm2, respectively. Compared to GPUs, ConvFIFO exhibits only an average accuracy loss of 1.82% during inference.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 9","pages":"1640-1651"},"PeriodicalIF":2.8000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10559950/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Convolutional neural networks (ConvNets) have long been the model of choice for computer vision (CV) problems and gained renewed traction lately. In order to compute ConvNets more efficiently, process-in-memory (PIM) architectures based on emerging non-volatile memories (NVMs) such as RRAM have been widely studied. However, conventional NVM-based PIM suffered from various non-idealities including IR drop, sneak-path currents, large analog-to-digital converter (ADC) overhead, device variations, circuits mismatch, and error propagation. In this work, we propose ConvFIFO, a crossbar-memory-based PIM architecture for ConvNets featuring a kernel-stationary dataflow. Through the design of FIFO-type input and output buffers, smaller row-activation parallelism, and more compact ADCs, ConvFIFO can maximize the reuse rates of inputs and partial sums to achieve a more balanced trade-off among throughput, accuracy, and area/energy consumption. Using SRAM-based FIFO as the input/output buffer, ConvFIFO achieves a systolic architecture without the need to move weight data, bypassing the limitation of NVM endurance and minimizing the movement of partial sums. Moreover, the FIFO nature of the dataflow allows flexible pipeline design and load balancing. Compared to classical NVM-based PIM architectures such as ISAAC, ConvFIFO exhibits significant performance enhancement for various ConvNet models, showing 1.66–
$1.69\times $
/1.69–
$1.74\times $
/4.23–
$4.79\times $
/1.59–
$1.74\times $
improvement in terms of energy consumption, latency, Ops/W, and Ops/s
$\times $
mm2, respectively. Compared to GPUs, ConvFIFO exhibits only an average accuracy loss of 1.82% during inference.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.