{"title":"SC-IMC:基于sram的正弦/余弦和卷积加速的算法架构协同优化内存计算","authors":"Qi Cao;Shang Wang;Haisheng Fu;Qifan Gao;Zhenjiao Chen;Li Gao;Feng Liang","doi":"10.1109/TVLSI.2025.3573753","DOIUrl":null,"url":null,"abstract":"Sine/cosine (SC) is widely used in practical engineering applications, such as image compression and motor control. Nevertheless, due to power sensitivity and speed demands, SC acceleration suffers from limitations in traditional von-Neumann architectures. To overcome this challenge, we propose accelerating SC and convolution using a static random access memory (SRAM)-based in-memory computing (IMC) architecture through an algorithm-architecture co-optimization manner. We develop the first SC algorithm that transforms nonlinear operations into the IMC paradigm, enabling IMC array to handle both SC and artificial intelligence (AI) tasks and making the IMC array a reusable module. Our architecture extends computing functions of macro dedicated to convolutional neural networks (CNNs), with less than a 1% area increase. The proposed SC algorithm for FP32 data achieves high accuracy within 1 unit in the least significant place (ulp) error margin compared with <italic>C</i> math library. Moreover, we build an intelligent IMC system that supports various CNNs. Our IMC macro implements 512-kb binary weight storage within 3.0366-mm<sup>2</sup> area in SMIC 28-nm technology and presents area/energy efficiency of 2160.29–270.04 GOPS/mm<sup>2</sup> and 513.95–8.03 TOPS/W in CNN mode. The proposed algorithm and architecture facilitate the integration of more nonlinear functions into IMC with minimal area overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2200-2213"},"PeriodicalIF":3.1000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SC-IMC: Algorithm-Architecture Co-Optimized SRAM-Based In-Memory Computing for Sine/Cosine and Convolutional Acceleration\",\"authors\":\"Qi Cao;Shang Wang;Haisheng Fu;Qifan Gao;Zhenjiao Chen;Li Gao;Feng Liang\",\"doi\":\"10.1109/TVLSI.2025.3573753\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sine/cosine (SC) is widely used in practical engineering applications, such as image compression and motor control. Nevertheless, due to power sensitivity and speed demands, SC acceleration suffers from limitations in traditional von-Neumann architectures. To overcome this challenge, we propose accelerating SC and convolution using a static random access memory (SRAM)-based in-memory computing (IMC) architecture through an algorithm-architecture co-optimization manner. We develop the first SC algorithm that transforms nonlinear operations into the IMC paradigm, enabling IMC array to handle both SC and artificial intelligence (AI) tasks and making the IMC array a reusable module. Our architecture extends computing functions of macro dedicated to convolutional neural networks (CNNs), with less than a 1% area increase. The proposed SC algorithm for FP32 data achieves high accuracy within 1 unit in the least significant place (ulp) error margin compared with <italic>C</i> math library. Moreover, we build an intelligent IMC system that supports various CNNs. Our IMC macro implements 512-kb binary weight storage within 3.0366-mm<sup>2</sup> area in SMIC 28-nm technology and presents area/energy efficiency of 2160.29–270.04 GOPS/mm<sup>2</sup> and 513.95–8.03 TOPS/W in CNN mode. The proposed algorithm and architecture facilitate the integration of more nonlinear functions into IMC with minimal area overhead.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 8\",\"pages\":\"2200-2213\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11030322/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11030322/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
SC-IMC: Algorithm-Architecture Co-Optimized SRAM-Based In-Memory Computing for Sine/Cosine and Convolutional Acceleration
Sine/cosine (SC) is widely used in practical engineering applications, such as image compression and motor control. Nevertheless, due to power sensitivity and speed demands, SC acceleration suffers from limitations in traditional von-Neumann architectures. To overcome this challenge, we propose accelerating SC and convolution using a static random access memory (SRAM)-based in-memory computing (IMC) architecture through an algorithm-architecture co-optimization manner. We develop the first SC algorithm that transforms nonlinear operations into the IMC paradigm, enabling IMC array to handle both SC and artificial intelligence (AI) tasks and making the IMC array a reusable module. Our architecture extends computing functions of macro dedicated to convolutional neural networks (CNNs), with less than a 1% area increase. The proposed SC algorithm for FP32 data achieves high accuracy within 1 unit in the least significant place (ulp) error margin compared with C math library. Moreover, we build an intelligent IMC system that supports various CNNs. Our IMC macro implements 512-kb binary weight storage within 3.0366-mm2 area in SMIC 28-nm technology and presents area/energy efficiency of 2160.29–270.04 GOPS/mm2 and 513.95–8.03 TOPS/W in CNN mode. The proposed algorithm and architecture facilitate the integration of more nonlinear functions into IMC with minimal area overhead.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.