{"title":"Dyn-Bitpool:一个28纳米27 TOPS/W的双边稀疏CIM加速器,具有平衡的工作负载方案和高CIM宏观利用率","authors":"Xujiang Xiang;Zhiheng Yue;Xiaolong Zhang;Shaojun Wei;Yang Hu;Shouyi Yin","doi":"10.1109/TCSI.2025.3547001","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have brought about a transformative impact across various sectors. However, the proliferation of DNNs has led to a surge in computational intensity and data traffic, thereby imposing substantial demands on the power capacity and battery life of computing systems. Computing-in-memory (CIM) is considered a promising architecture to resolve or mitigate the memory wall challenge by integrating computational elements within memory arrays. Yet prior studies on CIM have seldom capitalized on sparsity in both activations and weights simultaneously. Furthermore, the exploitation of two-sided sparsity—sparsity in both activations and weights—presents new challenges, such as imbalanced workload and low hardware substrate utilization. To harness the full potential of two-sided sparsity for acceleration, we present Dyn-Bitpool, an accelerator that introduces innovations on two fronts: 1) a balanced workload scheme, “pool first and cross lane sharing”, which maximizes performance gains enabled by the bit-level sparsity in activations; and 2) a dynamic topology for CIM arrays to effectively address the low CIM macro utilization issue caused by the value-level sparsity in weights. These collective advancements yield an average speedup of 1.91x and 2.67x for Dyn-Bitpool on eight prevalent neural networks, outperforming two cutting-edge CIM-based accelerators.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 5","pages":"2216-2228"},"PeriodicalIF":5.2000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dyn-Bitpool: A 28 nm 27 TOPS/W Two-Sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization\",\"authors\":\"Xujiang Xiang;Zhiheng Yue;Xiaolong Zhang;Shaojun Wei;Yang Hu;Shouyi Yin\",\"doi\":\"10.1109/TCSI.2025.3547001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) have brought about a transformative impact across various sectors. However, the proliferation of DNNs has led to a surge in computational intensity and data traffic, thereby imposing substantial demands on the power capacity and battery life of computing systems. Computing-in-memory (CIM) is considered a promising architecture to resolve or mitigate the memory wall challenge by integrating computational elements within memory arrays. Yet prior studies on CIM have seldom capitalized on sparsity in both activations and weights simultaneously. Furthermore, the exploitation of two-sided sparsity—sparsity in both activations and weights—presents new challenges, such as imbalanced workload and low hardware substrate utilization. To harness the full potential of two-sided sparsity for acceleration, we present Dyn-Bitpool, an accelerator that introduces innovations on two fronts: 1) a balanced workload scheme, “pool first and cross lane sharing”, which maximizes performance gains enabled by the bit-level sparsity in activations; and 2) a dynamic topology for CIM arrays to effectively address the low CIM macro utilization issue caused by the value-level sparsity in weights. These collective advancements yield an average speedup of 1.91x and 2.67x for Dyn-Bitpool on eight prevalent neural networks, outperforming two cutting-edge CIM-based accelerators.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"72 5\",\"pages\":\"2216-2228\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10931136/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10931136/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Dyn-Bitpool: A 28 nm 27 TOPS/W Two-Sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization
Deep neural networks (DNNs) have brought about a transformative impact across various sectors. However, the proliferation of DNNs has led to a surge in computational intensity and data traffic, thereby imposing substantial demands on the power capacity and battery life of computing systems. Computing-in-memory (CIM) is considered a promising architecture to resolve or mitigate the memory wall challenge by integrating computational elements within memory arrays. Yet prior studies on CIM have seldom capitalized on sparsity in both activations and weights simultaneously. Furthermore, the exploitation of two-sided sparsity—sparsity in both activations and weights—presents new challenges, such as imbalanced workload and low hardware substrate utilization. To harness the full potential of two-sided sparsity for acceleration, we present Dyn-Bitpool, an accelerator that introduces innovations on two fronts: 1) a balanced workload scheme, “pool first and cross lane sharing”, which maximizes performance gains enabled by the bit-level sparsity in activations; and 2) a dynamic topology for CIM arrays to effectively address the low CIM macro utilization issue caused by the value-level sparsity in weights. These collective advancements yield an average speedup of 1.91x and 2.67x for Dyn-Bitpool on eight prevalent neural networks, outperforming two cutting-edge CIM-based accelerators.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.