Hop-CIM: An all-digital two-level approximate SRAM-CIM macro for high energy-efficient HNN acceleration with data-aware early exit and column-wise partial-sum reuse

IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Shunqin Cai , Liukai Xu , Wentao Liu , Dengfeng Wang , Keqing Ouyang , Jinyu Wang , Weizhong Wu , Qiang Huang , Zhi Li , Yanan Sun
{"title":"Hop-CIM: An all-digital two-level approximate SRAM-CIM macro for high energy-efficient HNN acceleration with data-aware early exit and column-wise partial-sum reuse","authors":"Shunqin Cai ,&nbsp;Liukai Xu ,&nbsp;Wentao Liu ,&nbsp;Dengfeng Wang ,&nbsp;Keqing Ouyang ,&nbsp;Jinyu Wang ,&nbsp;Weizhong Wu ,&nbsp;Qiang Huang ,&nbsp;Zhi Li ,&nbsp;Yanan Sun","doi":"10.1016/j.vlsi.2025.102525","DOIUrl":null,"url":null,"abstract":"<div><div>Hopfield Neural Networks (HNNs) have emerged as a promising paradigm for image restoration tasks with the inner associative memory properties to corrupted image reconstruction. However, traditional HNN accelerators relying on full-precision vector-matrix multiplication (VMM) operations introduce significant computational redundancy, as the binary state updates process of HNNs depend solely on the sign of VMM results rather than their precise values. To address this inefficiency, Hop-CIM, an all-digital approximate SRAM-based computing-in-memory (SRAM-CIM) macro, is proposed for energy-efficient HNN acceleration. The key innovation points of the proposed Hop-CIM include: (1) a two-level approximation strategy that fully exploits the error-tolerant characteristics of HNNs, (2) a data-aware threshold-based early exit mechanism during tile-by-tile partial-sum accumulation, and (3) a partial-sum reuse method with column-wise weight matrix compression. The combined effect of (2) and (3) reduces the redundant multiply-and-accumulate (MAC) operations by 55.7 %. Experimental results demonstrate that under 28 nm technology node, the proposed Hop-CIM macro delivers 1591<em>TOPS/W</em> energy efficiency with 1-bit/1-bit input/weight quantization, outperforming the traditional full-precision SRAM-CIM design with 1-bit/3-bit input/weight quantization and the approximate SRAM-CIM design with 1-bit/1-bit input/weight quantization by 7.72x and 2.48x, respectively. In addition, Hop-CIM achieves 0.944 in structural similarity index measure (<em>SSIM</em>) for 28 × 28 image restoration.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"105 ","pages":"Article 102525"},"PeriodicalIF":2.5000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integration-The Vlsi Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167926025001828","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Hopfield Neural Networks (HNNs) have emerged as a promising paradigm for image restoration tasks with the inner associative memory properties to corrupted image reconstruction. However, traditional HNN accelerators relying on full-precision vector-matrix multiplication (VMM) operations introduce significant computational redundancy, as the binary state updates process of HNNs depend solely on the sign of VMM results rather than their precise values. To address this inefficiency, Hop-CIM, an all-digital approximate SRAM-based computing-in-memory (SRAM-CIM) macro, is proposed for energy-efficient HNN acceleration. The key innovation points of the proposed Hop-CIM include: (1) a two-level approximation strategy that fully exploits the error-tolerant characteristics of HNNs, (2) a data-aware threshold-based early exit mechanism during tile-by-tile partial-sum accumulation, and (3) a partial-sum reuse method with column-wise weight matrix compression. The combined effect of (2) and (3) reduces the redundant multiply-and-accumulate (MAC) operations by 55.7 %. Experimental results demonstrate that under 28 nm technology node, the proposed Hop-CIM macro delivers 1591TOPS/W energy efficiency with 1-bit/1-bit input/weight quantization, outperforming the traditional full-precision SRAM-CIM design with 1-bit/3-bit input/weight quantization and the approximate SRAM-CIM design with 1-bit/1-bit input/weight quantization by 7.72x and 2.48x, respectively. In addition, Hop-CIM achieves 0.944 in structural similarity index measure (SSIM) for 28 × 28 image restoration.
Hop-CIM:一个全数字的两级近似SRAM-CIM宏,用于高能效HNN加速,具有数据感知的早期退出和按列部分和重用
Hopfield神经网络(HNNs)已成为一种有前途的图像恢复范式,具有内部联想记忆特性的损坏图像重建。然而,传统的HNN加速器依赖于全精度向量矩阵乘法(VMM)运算,由于HNN的二进制状态更新过程仅依赖于VMM结果的符号而不是其精确值,因此引入了大量的计算冗余。为了解决这种低效率问题,提出了Hop-CIM,一种基于sram的全数字近似内存计算(SRAM-CIM)宏,用于节能HNN加速。Hop-CIM的创新点包括:(1)充分利用hnn容错特性的两级逼近策略;(2)逐块部分和累积过程中基于数据感知阈值的早期退出机制;(3)基于逐列权矩阵压缩的部分和重用方法。(2)和(3)的综合效应使冗余的乘法累加(MAC)操作减少了55.7%。实验结果表明,在28 nm技术节点下,采用1比特/1比特输入/权值量化的Hop-CIM宏能提供1591TOPS/W的能量效率,分别比采用1比特/3比特输入/权值量化的传统全精度SRAM-CIM设计和采用1比特/1比特输入/权值量化的近似SRAM-CIM设计分别高出7.72倍和2.48倍。此外,Hop-CIM在28 × 28图像恢复中的结构相似指数(SSIM)达到0.944。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Integration-The Vlsi Journal
Integration-The Vlsi Journal 工程技术-工程:电子与电气
CiteScore
3.80
自引率
5.30%
发文量
107
审稿时长
6 months
期刊介绍: Integration''s aim is to cover every aspect of the VLSI area, with an emphasis on cross-fertilization between various fields of science, and the design, verification, test and applications of integrated circuits and systems, as well as closely related topics in process and device technologies. Individual issues will feature peer-reviewed tutorials and articles as well as reviews of recent publications. The intended coverage of the journal can be assessed by examining the following (non-exclusive) list of topics: Specification methods and languages; Analog/Digital Integrated Circuits and Systems; VLSI architectures; Algorithms, methods and tools for modeling, simulation, synthesis and verification of integrated circuits and systems of any complexity; Embedded systems; High-level synthesis for VLSI systems; Logic synthesis and finite automata; Testing, design-for-test and test generation algorithms; Physical design; Formal verification; Algorithms implemented in VLSI systems; Systems engineering; Heterogeneous systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信