Hop-CIM: An all-digital two-level approximate SRAM-CIM macro for high energy-efficient HNN acceleration with data-aware early exit and column-wise partial-sum reuse

IF 2.5 3区工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Integration-The Vlsi Journal Pub Date : 2025-08-21 DOI:10.1016/j.vlsi.2025.102525

Shunqin Cai , Liukai Xu , Wentao Liu , Dengfeng Wang , Keqing Ouyang , Jinyu Wang , Weizhong Wu , Qiang Huang , Zhi Li , Yanan Sun

{"title":"Hop-CIM: An all-digital two-level approximate SRAM-CIM macro for high energy-efficient HNN acceleration with data-aware early exit and column-wise partial-sum reuse","authors":"Shunqin Cai , Liukai Xu , Wentao Liu , Dengfeng Wang , Keqing Ouyang , Jinyu Wang , Weizhong Wu , Qiang Huang , Zhi Li , Yanan Sun","doi":"10.1016/j.vlsi.2025.102525","DOIUrl":null,"url":null,"abstract":"<div><div>Hopfield Neural Networks (HNNs) have emerged as a promising paradigm for image restoration tasks with the inner associative memory properties to corrupted image reconstruction. However, traditional HNN accelerators relying on full-precision vector-matrix multiplication (VMM) operations introduce significant computational redundancy, as the binary state updates process of HNNs depend solely on the sign of VMM results rather than their precise values. To address this inefficiency, Hop-CIM, an all-digital approximate SRAM-based computing-in-memory (SRAM-CIM) macro, is proposed for energy-efficient HNN acceleration. The key innovation points of the proposed Hop-CIM include: (1) a two-level approximation strategy that fully exploits the error-tolerant characteristics of HNNs, (2) a data-aware threshold-based early exit mechanism during tile-by-tile partial-sum accumulation, and (3) a partial-sum reuse method with column-wise weight matrix compression. The combined effect of (2) and (3) reduces the redundant multiply-and-accumulate (MAC) operations by 55.7 %. Experimental results demonstrate that under 28 nm technology node, the proposed Hop-CIM macro delivers 1591<em>TOPS/W</em> energy efficiency with 1-bit/1-bit input/weight quantization, outperforming the traditional full-precision SRAM-CIM design with 1-bit/3-bit input/weight quantization and the approximate SRAM-CIM design with 1-bit/1-bit input/weight quantization by 7.72x and 2.48x, respectively. In addition, Hop-CIM achieves 0.944 in structural similarity index measure (<em>SSIM</em>) for 28 × 28 image restoration.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"105 ","pages":"Article 102525"},"PeriodicalIF":2.5000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integration-The Vlsi Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167926025001828","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Hopfield Neural Networks (HNNs) have emerged as a promising paradigm for image restoration tasks with the inner associative memory properties to corrupted image reconstruction. However, traditional HNN accelerators relying on full-precision vector-matrix multiplication (VMM) operations introduce significant computational redundancy, as the binary state updates process of HNNs depend solely on the sign of VMM results rather than their precise values. To address this inefficiency, Hop-CIM, an all-digital approximate SRAM-based computing-in-memory (SRAM-CIM) macro, is proposed for energy-efficient HNN acceleration. The key innovation points of the proposed Hop-CIM include: (1) a two-level approximation strategy that fully exploits the error-tolerant characteristics of HNNs, (2) a data-aware threshold-based early exit mechanism during tile-by-tile partial-sum accumulation, and (3) a partial-sum reuse method with column-wise weight matrix compression. The combined effect of (2) and (3) reduces the redundant multiply-and-accumulate (MAC) operations by 55.7 %. Experimental results demonstrate that under 28 nm technology node, the proposed Hop-CIM macro delivers 1591TOPS/W energy efficiency with 1-bit/1-bit input/weight quantization, outperforming the traditional full-precision SRAM-CIM design with 1-bit/3-bit input/weight quantization and the approximate SRAM-CIM design with 1-bit/1-bit input/weight quantization by 7.72x and 2.48x, respectively. In addition, Hop-CIM achieves 0.944 in structural similarity index measure (SSIM) for 28 × 28 image restoration.

查看原文本刊更多论文

Hop-CIM：一个全数字的两级近似SRAM-CIM宏，用于高能效HNN加速，具有数据感知的早期退出和按列部分和重用

Hopfield神经网络（HNNs）已成为一种有前途的图像恢复范式，具有内部联想记忆特性的损坏图像重建。然而，传统的HNN加速器依赖于全精度向量矩阵乘法（VMM）运算，由于HNN的二进制状态更新过程仅依赖于VMM结果的符号而不是其精确值，因此引入了大量的计算冗余。为了解决这种低效率问题，提出了Hop-CIM，一种基于sram的全数字近似内存计算（SRAM-CIM）宏，用于节能HNN加速。Hop-CIM的创新点包括：(1)充分利用hnn容错特性的两级逼近策略；(2)逐块部分和累积过程中基于数据感知阈值的早期退出机制；(3)基于逐列权矩阵压缩的部分和重用方法。(2)和(3)的综合效应使冗余的乘法累加（MAC）操作减少了55.7%。实验结果表明，在28 nm技术节点下，采用1比特/1比特输入/权值量化的Hop-CIM宏能提供1591TOPS/W的能量效率，分别比采用1比特/3比特输入/权值量化的传统全精度SRAM-CIM设计和采用1比特/1比特输入/权值量化的近似SRAM-CIM设计分别高出7.72倍和2.48倍。此外，Hop-CIM在28 × 28图像恢复中的结构相似指数（SSIM）达到0.944。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Integration-The Vlsi Journal 工程技术-工程：电子与电气

CiteScore

3.80

自引率

5.30%

发文量

107

审稿时长

6 months

期刊介绍： Integration''s aim is to cover every aspect of the VLSI area, with an emphasis on cross-fertilization between various fields of science, and the design, verification, test and applications of integrated circuits and systems, as well as closely related topics in process and device technologies. Individual issues will feature peer-reviewed tutorials and articles as well as reviews of recent publications. The intended coverage of the journal can be assessed by examining the following (non-exclusive) list of topics: Specification methods and languages; Analog/Digital Integrated Circuits and Systems; VLSI architectures; Algorithms, methods and tools for modeling, simulation, synthesis and verification of integrated circuits and systems of any complexity; Embedded systems; High-level synthesis for VLSI systems; Logic synthesis and finite automata; Testing, design-for-test and test generation algorithms; Physical design; Formal verification; Algorithms implemented in VLSI systems; Systems engineering; Heterogeneous systems.