低功耗和区域高效CIM:一种基于sram的全数字内存计算硬件加速处理器,具有近似加法树,用于多精度稀疏神经网络

IF 1.9 3区 工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Zhendong Fang, Yi Wang, Yaohua Xu
{"title":"低功耗和区域高效CIM:一种基于sram的全数字内存计算硬件加速处理器,具有近似加法树,用于多精度稀疏神经网络","authors":"Zhendong Fang,&nbsp;Yi Wang,&nbsp;Yaohua Xu","doi":"10.1016/j.mejo.2025.106903","DOIUrl":null,"url":null,"abstract":"<div><div>Emergent architecture called computing-in-memory (CIM) effectively alleviates the issue of insufficient memory bandwidth and reduces energy consumption when accessing the on-chip buffer and registers. Some analog CIM macros designed to accelerate the neural network inference process have demonstrated significant improvements in both throughput and energy efficiency. These analog CIM macros are primarily utilized for neural networks with fixed activation and weight precision, which poses challenges for widespread deployment on edge devices with limited resources. On the other hand, analog macros exhibit heightened sensitivity to variations in process, voltage, and temperature, and the overhead associated with data conversion between analog and digital domains is unavoidable during calculation. Furthermore, exploring the sparse scheme compatible with CIM architecture can be beneficial in enhancing the energy efficiency of sparse neural network models. This article presents an SRAM-based fully-digital CIM hardware acceleration processor named Low-Power and Area-Efficient CIM (LPAE CIM), which combines the memory and computing macro of fully-digital architecture with peripheral storage and control modules to form a relatively comprehensive systematic structure. First, the sparsity of weight data is effectively utilized through a structured pruning method, and successive rows of the macro are opened flexibly for processing the sparsity of input activation. Second, the proposed area-friendly approximate adder tree replaces partial full adders with OR gates, reducing transistor count and promoting high-density integration of system-on-chip. Third, the shift adder outside the macro features dynamically adjustable 1–8 bit input activation and reconfigurable 4/8 bit storage weight, providing flexibility for fixed hardware resources. It achieves 148.5 TOPS/W energy efficiency at 4-bit activation and weight precision, which shows at least a 1.32× improvement over prior works.</div></div>","PeriodicalId":49818,"journal":{"name":"Microelectronics Journal","volume":"166 ","pages":"Article 106903"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Low-Power and Area-Efficient CIM: An SRAM-based fully-digital computing-in-memory hardware acceleration processor with approximate adder tree for multi-precision sparse neural networks\",\"authors\":\"Zhendong Fang,&nbsp;Yi Wang,&nbsp;Yaohua Xu\",\"doi\":\"10.1016/j.mejo.2025.106903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Emergent architecture called computing-in-memory (CIM) effectively alleviates the issue of insufficient memory bandwidth and reduces energy consumption when accessing the on-chip buffer and registers. Some analog CIM macros designed to accelerate the neural network inference process have demonstrated significant improvements in both throughput and energy efficiency. These analog CIM macros are primarily utilized for neural networks with fixed activation and weight precision, which poses challenges for widespread deployment on edge devices with limited resources. On the other hand, analog macros exhibit heightened sensitivity to variations in process, voltage, and temperature, and the overhead associated with data conversion between analog and digital domains is unavoidable during calculation. Furthermore, exploring the sparse scheme compatible with CIM architecture can be beneficial in enhancing the energy efficiency of sparse neural network models. This article presents an SRAM-based fully-digital CIM hardware acceleration processor named Low-Power and Area-Efficient CIM (LPAE CIM), which combines the memory and computing macro of fully-digital architecture with peripheral storage and control modules to form a relatively comprehensive systematic structure. First, the sparsity of weight data is effectively utilized through a structured pruning method, and successive rows of the macro are opened flexibly for processing the sparsity of input activation. Second, the proposed area-friendly approximate adder tree replaces partial full adders with OR gates, reducing transistor count and promoting high-density integration of system-on-chip. Third, the shift adder outside the macro features dynamically adjustable 1–8 bit input activation and reconfigurable 4/8 bit storage weight, providing flexibility for fixed hardware resources. It achieves 148.5 TOPS/W energy efficiency at 4-bit activation and weight precision, which shows at least a 1.32× improvement over prior works.</div></div>\",\"PeriodicalId\":49818,\"journal\":{\"name\":\"Microelectronics Journal\",\"volume\":\"166 \",\"pages\":\"Article 106903\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microelectronics Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1879239125003522\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microelectronics Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1879239125003522","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

称为内存计算(CIM)的新兴架构有效地缓解了内存带宽不足的问题,并降低了访问片上缓冲区和寄存器时的能耗。一些用于加速神经网络推理过程的模拟CIM宏在吞吐量和能源效率方面都有显著改善。这些模拟CIM宏主要用于具有固定激活和权重精度的神经网络,这对在资源有限的边缘设备上广泛部署提出了挑战。另一方面,模拟宏对过程、电压和温度的变化表现出更高的敏感性,并且在计算过程中,与模拟域和数字域之间的数据转换相关的开销是不可避免的。此外,探索与CIM结构兼容的稀疏方案有助于提高稀疏神经网络模型的能量效率。本文提出了一种基于sram的全数字CIM硬件加速处理器Low-Power and Area-Efficient CIM (LPAE CIM),它将全数字架构的内存和计算宏与外围存储和控制模块相结合,形成了一个比较全面的系统结构。首先,通过结构化剪枝方法有效利用权值数据的稀疏性,灵活打开宏的连续行,处理输入激活的稀疏性;其次,提出的面积友好型近似加法器树用或门取代部分全加法器,减少晶体管数量,促进片上系统的高密度集成。第三,宏外的移位加法器具有动态可调的1-8位输入激活和可重构的4/8位存储权重,为固定的硬件资源提供了灵活性。它在4位激活和重量精度下实现了148.5 TOPS/W的能量效率,比之前的工作至少提高了1.32倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Low-Power and Area-Efficient CIM: An SRAM-based fully-digital computing-in-memory hardware acceleration processor with approximate adder tree for multi-precision sparse neural networks
Emergent architecture called computing-in-memory (CIM) effectively alleviates the issue of insufficient memory bandwidth and reduces energy consumption when accessing the on-chip buffer and registers. Some analog CIM macros designed to accelerate the neural network inference process have demonstrated significant improvements in both throughput and energy efficiency. These analog CIM macros are primarily utilized for neural networks with fixed activation and weight precision, which poses challenges for widespread deployment on edge devices with limited resources. On the other hand, analog macros exhibit heightened sensitivity to variations in process, voltage, and temperature, and the overhead associated with data conversion between analog and digital domains is unavoidable during calculation. Furthermore, exploring the sparse scheme compatible with CIM architecture can be beneficial in enhancing the energy efficiency of sparse neural network models. This article presents an SRAM-based fully-digital CIM hardware acceleration processor named Low-Power and Area-Efficient CIM (LPAE CIM), which combines the memory and computing macro of fully-digital architecture with peripheral storage and control modules to form a relatively comprehensive systematic structure. First, the sparsity of weight data is effectively utilized through a structured pruning method, and successive rows of the macro are opened flexibly for processing the sparsity of input activation. Second, the proposed area-friendly approximate adder tree replaces partial full adders with OR gates, reducing transistor count and promoting high-density integration of system-on-chip. Third, the shift adder outside the macro features dynamically adjustable 1–8 bit input activation and reconfigurable 4/8 bit storage weight, providing flexibility for fixed hardware resources. It achieves 148.5 TOPS/W energy efficiency at 4-bit activation and weight precision, which shows at least a 1.32× improvement over prior works.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Microelectronics Journal
Microelectronics Journal 工程技术-工程:电子与电气
CiteScore
4.00
自引率
27.30%
发文量
222
审稿时长
43 days
期刊介绍: Published since 1969, the Microelectronics Journal is an international forum for the dissemination of research and applications of microelectronic systems, circuits, and emerging technologies. Papers published in the Microelectronics Journal have undergone peer review to ensure originality, relevance, and timeliness. The journal thus provides a worldwide, regular, and comprehensive update on microelectronic circuits and systems. The Microelectronics Journal invites papers describing significant research and applications in all of the areas listed below. Comprehensive review/survey papers covering recent developments will also be considered. The Microelectronics Journal covers circuits and systems. This topic includes but is not limited to: Analog, digital, mixed, and RF circuits and related design methodologies; Logic, architectural, and system level synthesis; Testing, design for testability, built-in self-test; Area, power, and thermal analysis and design; Mixed-domain simulation and design; Embedded systems; Non-von Neumann computing and related technologies and circuits; Design and test of high complexity systems integration; SoC, NoC, SIP, and NIP design and test; 3-D integration design and analysis; Emerging device technologies and circuits, such as FinFETs, SETs, spintronics, SFQ, MTJ, etc. Application aspects such as signal and image processing including circuits for cryptography, sensors, and actuators including sensor networks, reliability and quality issues, and economic models are also welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信