Low-Power and Area-Efficient CIM: An SRAM-based fully-digital computing-in-memory hardware acceleration processor with approximate adder tree for multi-precision sparse neural networks
IF 1.9 3区 工程技术Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
{"title":"Low-Power and Area-Efficient CIM: An SRAM-based fully-digital computing-in-memory hardware acceleration processor with approximate adder tree for multi-precision sparse neural networks","authors":"Zhendong Fang, Yi Wang, Yaohua Xu","doi":"10.1016/j.mejo.2025.106903","DOIUrl":null,"url":null,"abstract":"<div><div>Emergent architecture called computing-in-memory (CIM) effectively alleviates the issue of insufficient memory bandwidth and reduces energy consumption when accessing the on-chip buffer and registers. Some analog CIM macros designed to accelerate the neural network inference process have demonstrated significant improvements in both throughput and energy efficiency. These analog CIM macros are primarily utilized for neural networks with fixed activation and weight precision, which poses challenges for widespread deployment on edge devices with limited resources. On the other hand, analog macros exhibit heightened sensitivity to variations in process, voltage, and temperature, and the overhead associated with data conversion between analog and digital domains is unavoidable during calculation. Furthermore, exploring the sparse scheme compatible with CIM architecture can be beneficial in enhancing the energy efficiency of sparse neural network models. This article presents an SRAM-based fully-digital CIM hardware acceleration processor named Low-Power and Area-Efficient CIM (LPAE CIM), which combines the memory and computing macro of fully-digital architecture with peripheral storage and control modules to form a relatively comprehensive systematic structure. First, the sparsity of weight data is effectively utilized through a structured pruning method, and successive rows of the macro are opened flexibly for processing the sparsity of input activation. Second, the proposed area-friendly approximate adder tree replaces partial full adders with OR gates, reducing transistor count and promoting high-density integration of system-on-chip. Third, the shift adder outside the macro features dynamically adjustable 1–8 bit input activation and reconfigurable 4/8 bit storage weight, providing flexibility for fixed hardware resources. It achieves 148.5 TOPS/W energy efficiency at 4-bit activation and weight precision, which shows at least a 1.32× improvement over prior works.</div></div>","PeriodicalId":49818,"journal":{"name":"Microelectronics Journal","volume":"166 ","pages":"Article 106903"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microelectronics Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1879239125003522","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Emergent architecture called computing-in-memory (CIM) effectively alleviates the issue of insufficient memory bandwidth and reduces energy consumption when accessing the on-chip buffer and registers. Some analog CIM macros designed to accelerate the neural network inference process have demonstrated significant improvements in both throughput and energy efficiency. These analog CIM macros are primarily utilized for neural networks with fixed activation and weight precision, which poses challenges for widespread deployment on edge devices with limited resources. On the other hand, analog macros exhibit heightened sensitivity to variations in process, voltage, and temperature, and the overhead associated with data conversion between analog and digital domains is unavoidable during calculation. Furthermore, exploring the sparse scheme compatible with CIM architecture can be beneficial in enhancing the energy efficiency of sparse neural network models. This article presents an SRAM-based fully-digital CIM hardware acceleration processor named Low-Power and Area-Efficient CIM (LPAE CIM), which combines the memory and computing macro of fully-digital architecture with peripheral storage and control modules to form a relatively comprehensive systematic structure. First, the sparsity of weight data is effectively utilized through a structured pruning method, and successive rows of the macro are opened flexibly for processing the sparsity of input activation. Second, the proposed area-friendly approximate adder tree replaces partial full adders with OR gates, reducing transistor count and promoting high-density integration of system-on-chip. Third, the shift adder outside the macro features dynamically adjustable 1–8 bit input activation and reconfigurable 4/8 bit storage weight, providing flexibility for fixed hardware resources. It achieves 148.5 TOPS/W energy efficiency at 4-bit activation and weight precision, which shows at least a 1.32× improvement over prior works.
期刊介绍:
Published since 1969, the Microelectronics Journal is an international forum for the dissemination of research and applications of microelectronic systems, circuits, and emerging technologies. Papers published in the Microelectronics Journal have undergone peer review to ensure originality, relevance, and timeliness. The journal thus provides a worldwide, regular, and comprehensive update on microelectronic circuits and systems.
The Microelectronics Journal invites papers describing significant research and applications in all of the areas listed below. Comprehensive review/survey papers covering recent developments will also be considered. The Microelectronics Journal covers circuits and systems. This topic includes but is not limited to: Analog, digital, mixed, and RF circuits and related design methodologies; Logic, architectural, and system level synthesis; Testing, design for testability, built-in self-test; Area, power, and thermal analysis and design; Mixed-domain simulation and design; Embedded systems; Non-von Neumann computing and related technologies and circuits; Design and test of high complexity systems integration; SoC, NoC, SIP, and NIP design and test; 3-D integration design and analysis; Emerging device technologies and circuits, such as FinFETs, SETs, spintronics, SFQ, MTJ, etc.
Application aspects such as signal and image processing including circuits for cryptography, sensors, and actuators including sensor networks, reliability and quality issues, and economic models are also welcome.