SC-IMC: Algorithm-Architecture Co-Optimized SRAM-Based In-Memory Computing for Sine/Cosine and Convolutional Acceleration

IF 3.1 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-06-11 DOI:10.1109/TVLSI.2025.3573753

Qi Cao;Shang Wang;Haisheng Fu;Qifan Gao;Zhenjiao Chen;Li Gao;Feng Liang

{"title":"SC-IMC: Algorithm-Architecture Co-Optimized SRAM-Based In-Memory Computing for Sine/Cosine and Convolutional Acceleration","authors":"Qi Cao;Shang Wang;Haisheng Fu;Qifan Gao;Zhenjiao Chen;Li Gao;Feng Liang","doi":"10.1109/TVLSI.2025.3573753","DOIUrl":null,"url":null,"abstract":"Sine/cosine (SC) is widely used in practical engineering applications, such as image compression and motor control. Nevertheless, due to power sensitivity and speed demands, SC acceleration suffers from limitations in traditional von-Neumann architectures. To overcome this challenge, we propose accelerating SC and convolution using a static random access memory (SRAM)-based in-memory computing (IMC) architecture through an algorithm-architecture co-optimization manner. We develop the first SC algorithm that transforms nonlinear operations into the IMC paradigm, enabling IMC array to handle both SC and artificial intelligence (AI) tasks and making the IMC array a reusable module. Our architecture extends computing functions of macro dedicated to convolutional neural networks (CNNs), with less than a 1% area increase. The proposed SC algorithm for FP32 data achieves high accuracy within 1 unit in the least significant place (ulp) error margin compared with <italic>C math library. Moreover, we build an intelligent IMC system that supports various CNNs. Our IMC macro implements 512-kb binary weight storage within 3.0366-mm2 area in SMIC 28-nm technology and presents area/energy efficiency of 2160.29–270.04 GOPS/mm2 and 513.95–8.03 TOPS/W in CNN mode. The proposed algorithm and architecture facilitate the integration of more nonlinear functions into IMC with minimal area overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2200-2213"},"PeriodicalIF":3.1000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11030322/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Sine/cosine (SC) is widely used in practical engineering applications, such as image compression and motor control. Nevertheless, due to power sensitivity and speed demands, SC acceleration suffers from limitations in traditional von-Neumann architectures. To overcome this challenge, we propose accelerating SC and convolution using a static random access memory (SRAM)-based in-memory computing (IMC) architecture through an algorithm-architecture co-optimization manner. We develop the first SC algorithm that transforms nonlinear operations into the IMC paradigm, enabling IMC array to handle both SC and artificial intelligence (AI) tasks and making the IMC array a reusable module. Our architecture extends computing functions of macro dedicated to convolutional neural networks (CNNs), with less than a 1% area increase. The proposed SC algorithm for FP32 data achieves high accuracy within 1 unit in the least significant place (ulp) error margin compared with C math library. Moreover, we build an intelligent IMC system that supports various CNNs. Our IMC macro implements 512-kb binary weight storage within 3.0366-mm² area in SMIC 28-nm technology and presents area/energy efficiency of 2160.29–270.04 GOPS/mm² and 513.95–8.03 TOPS/W in CNN mode. The proposed algorithm and architecture facilitate the integration of more nonlinear functions into IMC with minimal area overhead.

查看原文本刊更多论文

SC-IMC：基于sram的正弦/余弦和卷积加速的算法架构协同优化内存计算

正弦/余弦（SC）在实际工程应用中得到了广泛的应用，如图像压缩和电机控制。然而，由于功率灵敏度和速度要求，SC加速在传统的冯-诺伊曼架构中受到限制。为了克服这一挑战，我们提出通过算法-架构协同优化的方式，使用基于静态随机存取存储器（SRAM）的内存计算（IMC）架构来加速SC和卷积。我们开发了第一个将非线性操作转换为IMC范式的SC算法，使IMC阵列能够处理SC和人工智能（AI）任务，并使IMC阵列成为可重复使用的模块。我们的架构扩展了卷积神经网络（cnn）专用宏的计算功能，面积增加不到1%。与C数学库相比，本文提出的SC算法在FP32数据的最小有效位（ulp）误差范围在1个单位以内，具有较高的精度。此外，我们还构建了一个支持各种cnn的智能IMC系统。我们的IMC宏采用中芯国际28纳米技术，在3.0366 mm2的面积内实现了512 kb二进制权重存储，其面积/能量效率为2160.29-270.04 GOPS/mm2，在CNN模式下为513.95-8.03 TOPS/W。所提出的算法和体系结构有助于以最小的面积开销将更多的非线性函数集成到IMC中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.