Dyn-Bitpool：一个28纳米27 TOPS/W的双边稀疏CIM加速器，具有平衡的工作负载方案和高CIM宏观利用率

IF 5.2 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems I: Regular Papers Pub Date : 2025-03-18 DOI:10.1109/TCSI.2025.3547001

Xujiang Xiang;Zhiheng Yue;Xiaolong Zhang;Shaojun Wei;Yang Hu;Shouyi Yin

{"title":"Dyn-Bitpool：一个28纳米27 TOPS/W的双边稀疏CIM加速器，具有平衡的工作负载方案和高CIM宏观利用率","authors":"Xujiang Xiang;Zhiheng Yue;Xiaolong Zhang;Shaojun Wei;Yang Hu;Shouyi Yin","doi":"10.1109/TCSI.2025.3547001","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have brought about a transformative impact across various sectors. However, the proliferation of DNNs has led to a surge in computational intensity and data traffic, thereby imposing substantial demands on the power capacity and battery life of computing systems. Computing-in-memory (CIM) is considered a promising architecture to resolve or mitigate the memory wall challenge by integrating computational elements within memory arrays. Yet prior studies on CIM have seldom capitalized on sparsity in both activations and weights simultaneously. Furthermore, the exploitation of two-sided sparsity—sparsity in both activations and weights—presents new challenges, such as imbalanced workload and low hardware substrate utilization. To harness the full potential of two-sided sparsity for acceleration, we present Dyn-Bitpool, an accelerator that introduces innovations on two fronts: 1) a balanced workload scheme, “pool first and cross lane sharing”, which maximizes performance gains enabled by the bit-level sparsity in activations; and 2) a dynamic topology for CIM arrays to effectively address the low CIM macro utilization issue caused by the value-level sparsity in weights. These collective advancements yield an average speedup of 1.91x and 2.67x for Dyn-Bitpool on eight prevalent neural networks, outperforming two cutting-edge CIM-based accelerators.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 5","pages":"2216-2228"},"PeriodicalIF":5.2000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dyn-Bitpool: A 28 nm 27 TOPS/W Two-Sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization\",\"authors\":\"Xujiang Xiang;Zhiheng Yue;Xiaolong Zhang;Shaojun Wei;Yang Hu;Shouyi Yin\",\"doi\":\"10.1109/TCSI.2025.3547001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) have brought about a transformative impact across various sectors. However, the proliferation of DNNs has led to a surge in computational intensity and data traffic, thereby imposing substantial demands on the power capacity and battery life of computing systems. Computing-in-memory (CIM) is considered a promising architecture to resolve or mitigate the memory wall challenge by integrating computational elements within memory arrays. Yet prior studies on CIM have seldom capitalized on sparsity in both activations and weights simultaneously. Furthermore, the exploitation of two-sided sparsity—sparsity in both activations and weights—presents new challenges, such as imbalanced workload and low hardware substrate utilization. To harness the full potential of two-sided sparsity for acceleration, we present Dyn-Bitpool, an accelerator that introduces innovations on two fronts: 1) a balanced workload scheme, “pool first and cross lane sharing”, which maximizes performance gains enabled by the bit-level sparsity in activations; and 2) a dynamic topology for CIM arrays to effectively address the low CIM macro utilization issue caused by the value-level sparsity in weights. These collective advancements yield an average speedup of 1.91x and 2.67x for Dyn-Bitpool on eight prevalent neural networks, outperforming two cutting-edge CIM-based accelerators.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"72 5\",\"pages\":\"2216-2228\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10931136/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10931136/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

深度神经网络（dnn）已经在各个领域带来了变革性的影响。然而，深度神经网络的扩散导致了计算强度和数据流量的激增，从而对计算系统的功率容量和电池寿命提出了巨大的要求。内存计算（CIM）被认为是一种很有前途的体系结构，可以通过在内存阵列中集成计算元素来解决或减轻内存墙的挑战。然而，先前对CIM的研究很少同时利用激活和权重的稀疏性。此外，利用双边稀疏性（激活和权重的稀疏性）带来了新的挑战，例如工作负载不平衡和硬件基板利用率低。为了充分利用双边稀疏性加速的潜力，我们提出了Dyn-Bitpool，这是一个在两个方面引入创新的加速器：1)平衡的工作负载方案，“池优先和跨车道共享”，最大限度地提高了激活中比特级稀疏性带来的性能提升；2)为CIM阵列提供动态拓扑结构，有效解决由权重值级稀疏性导致的CIM宏观利用率低的问题。这些共同的进步在8个流行的神经网络上为Dyn-Bitpool带来了1.91倍和2.67倍的平均加速，超过了两个基于cim的尖端加速器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dyn-Bitpool: A 28 nm 27 TOPS/W Two-Sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization

Deep neural networks (DNNs) have brought about a transformative impact across various sectors. However, the proliferation of DNNs has led to a surge in computational intensity and data traffic, thereby imposing substantial demands on the power capacity and battery life of computing systems. Computing-in-memory (CIM) is considered a promising architecture to resolve or mitigate the memory wall challenge by integrating computational elements within memory arrays. Yet prior studies on CIM have seldom capitalized on sparsity in both activations and weights simultaneously. Furthermore, the exploitation of two-sided sparsity—sparsity in both activations and weights—presents new challenges, such as imbalanced workload and low hardware substrate utilization. To harness the full potential of two-sided sparsity for acceleration, we present Dyn-Bitpool, an accelerator that introduces innovations on two fronts: 1) a balanced workload scheme, “pool first and cross lane sharing”, which maximizes performance gains enabled by the bit-level sparsity in activations; and 2) a dynamic topology for CIM arrays to effectively address the low CIM macro utilization issue caused by the value-level sparsity in weights. These collective advancements yield an average speedup of 1.91x and 2.67x for Dyn-Bitpool on eight prevalent neural networks, outperforming two cutting-edge CIM-based accelerators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems I: Regular Papers 工程技术-工程：电子与电气

CiteScore

9.80

自引率

11.80%

发文量

441

审稿时长

2 months

期刊介绍： TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.