{"title":"A Hybrid CAM-SRAM Processing-in-Memory Architecture With Feature Level Sparsity for Attention Mechanisms","authors":"Haiqiu Huang;Mingyu Wang;Xiaojie Li;Baiqing Zhong;Zeqi Yang;Tao Lu;Yicong Zhang;Zhiyi Yu","doi":"10.1109/TCSII.2025.3590432","DOIUrl":null,"url":null,"abstract":"The attention mechanism has become increasingly popular due to its ability to capture complex dependencies, enabling models like transformers to achieve remarkable performance in large language models (LLMs), computer vision, and other domains. However, the mechanism faces challenges such as low arithmetic intensity, leading to frequent data movement, and long sequence lengths, which introduce a large amount of redundant information. To mitigate both data movement and computational overhead in attention mechanisms, we propose a hybrid CAM-SRAM processing-in-memory architecture. By leveraging the parallel search and sort capabilities of content-addressable memory (CAM) arrays, we achieve dynamic fine-grained sparsification on features with varying variance, reducing the number of multiply-accumulate (MAC) operations in the matrix multiplication (MatMul). Furthermore, an approximate booth encoding is employed in our MAC unit to reduce the number of partial products and maintain the consistency of their signs. This eliminates the need for negation operations, simplifying the logic design. Experimental results show that, in different configurations, our feature-level sparsification scheme achieves over 80% sparsity with an acceptable accuracy drop. With sparsity up to 80%, our design achieves a performance of 0.252-1.26 TOPS and a power efficiency of 4.71-21.72 TOPS/W, operating at 1000 MHz on the TSMC 40nm process.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1283-1287"},"PeriodicalIF":4.9000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11084893/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The attention mechanism has become increasingly popular due to its ability to capture complex dependencies, enabling models like transformers to achieve remarkable performance in large language models (LLMs), computer vision, and other domains. However, the mechanism faces challenges such as low arithmetic intensity, leading to frequent data movement, and long sequence lengths, which introduce a large amount of redundant information. To mitigate both data movement and computational overhead in attention mechanisms, we propose a hybrid CAM-SRAM processing-in-memory architecture. By leveraging the parallel search and sort capabilities of content-addressable memory (CAM) arrays, we achieve dynamic fine-grained sparsification on features with varying variance, reducing the number of multiply-accumulate (MAC) operations in the matrix multiplication (MatMul). Furthermore, an approximate booth encoding is employed in our MAC unit to reduce the number of partial products and maintain the consistency of their signs. This eliminates the need for negation operations, simplifying the logic design. Experimental results show that, in different configurations, our feature-level sparsification scheme achieves over 80% sparsity with an acceptable accuracy drop. With sparsity up to 80%, our design achieves a performance of 0.252-1.26 TOPS and a power efficiency of 4.71-21.72 TOPS/W, operating at 1000 MHz on the TSMC 40nm process.
期刊介绍:
TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes:
Circuits: Analog, Digital and Mixed Signal Circuits and Systems
Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic
Circuits and Systems, Power Electronics and Systems
Software for Analog-and-Logic Circuits and Systems
Control aspects of Circuits and Systems.