基于随机计算的深度神经网络收缩阵列加速器结构

IF 3.1 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-24 DOI:10.1109/TVLSI.2025.3550786

Jingwei Zhu;Jingguo Wu;Zongru Yang;Yu Jiang;Yun Chen

{"title":"基于随机计算的深度神经网络收缩阵列加速器结构","authors":"Jingwei Zhu;Jingguo Wu;Zongru Yang;Yu Jiang;Yun Chen","doi":"10.1109/TVLSI.2025.3550786","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are widely used to handle various intelligent tasks. With the increased model size, the DNNs’ hardware accelerators are challenging the higher area overhead and energy consumption. Stochastic computing (SC) has recently been considered for implementing DNNs and reducing hardware consumption. However, many current SC-based DNN accelerators fail to balance accuracy, performance, and resource overhead. In addition, their limited scalability and flexibility restrict their use in edge devices. In this article, we design an area and energy-efficient DNN accelerator architecture using SC. We propose an SC-binary hybrid processing unit with piecewise shift compensation without significant additional hardware overhead increment to improve the SC accuracy. To balance performance and resource overhead, we conduct a design space exploration (DSE) from an overall architectural perspective. An experimental platform with both software and hardware for SC-based DNNs is established. The software simulation results demonstrate that the best accuracy of the designed SC-DNN on the CIFAR-10 is 91.9%, which is 3.2% higher than that of the previous SC-DNN work. The VLSI implementation of the hardware is synthesized using the TSMC 28-nm CMOS process. Results show that compared to the binary computing counterpart, our design achieves <inline-formula> <tex-math>$2.7\\times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$3.4\\times $ </tex-math></inline-formula> energy efficiency. Compared to other SC-DNN accelerator designs, our design can provide <inline-formula> <tex-math>$5.3\\times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$7.3\\times $ </tex-math></inline-formula> energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1582-1595"},"PeriodicalIF":3.1000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Area and Energy-Efficient Systolic Array Accelerator Architecture for Deep Neural Networks Using Stochastic Computing\",\"authors\":\"Jingwei Zhu;Jingguo Wu;Zongru Yang;Yu Jiang;Yun Chen\",\"doi\":\"10.1109/TVLSI.2025.3550786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) are widely used to handle various intelligent tasks. With the increased model size, the DNNs’ hardware accelerators are challenging the higher area overhead and energy consumption. Stochastic computing (SC) has recently been considered for implementing DNNs and reducing hardware consumption. However, many current SC-based DNN accelerators fail to balance accuracy, performance, and resource overhead. In addition, their limited scalability and flexibility restrict their use in edge devices. In this article, we design an area and energy-efficient DNN accelerator architecture using SC. We propose an SC-binary hybrid processing unit with piecewise shift compensation without significant additional hardware overhead increment to improve the SC accuracy. To balance performance and resource overhead, we conduct a design space exploration (DSE) from an overall architectural perspective. An experimental platform with both software and hardware for SC-based DNNs is established. The software simulation results demonstrate that the best accuracy of the designed SC-DNN on the CIFAR-10 is 91.9%, which is 3.2% higher than that of the previous SC-DNN work. The VLSI implementation of the hardware is synthesized using the TSMC 28-nm CMOS process. Results show that compared to the binary computing counterpart, our design achieves <inline-formula> <tex-math>$2.7\\\\times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$3.4\\\\times $ </tex-math></inline-formula> energy efficiency. Compared to other SC-DNN accelerator designs, our design can provide <inline-formula> <tex-math>$5.3\\\\times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$7.3\\\\times $ </tex-math></inline-formula> energy efficiency.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 6\",\"pages\":\"1582-1595\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10937936/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937936/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

深度神经网络（dnn）被广泛用于处理各种智能任务。随着模型尺寸的增加，深度神经网络的硬件加速器正在挑战更高的面积开销和能耗。随机计算（SC）最近被考虑用于实现深度神经网络和减少硬件消耗。然而，目前许多基于sc的深度神经网络加速器无法平衡准确性、性能和资源开销。此外，它们有限的可扩展性和灵活性限制了它们在边缘设备中的使用。在本文中，我们使用SC设计了一个面积和节能的DNN加速器架构。我们提出了一个SC-二进制混合处理单元，具有分段移位补偿，没有显著的额外硬件开销增量，以提高SC精度。为了平衡性能和资源开销，我们从整体架构的角度进行设计空间探索（DSE）。建立了基于sc的深度神经网络的软硬件实验平台。软件仿真结果表明，所设计的SC-DNN在CIFAR-10上的最佳精度为91.9%，比之前的SC-DNN工作精度提高了3.2%。该硬件的VLSI实现采用台积电28纳米CMOS工艺合成。结果表明，与二进制计算相比，我们的设计实现了2.7倍的面积效率和3.4倍的能源效率。与其他SC-DNN加速器设计相比，我们的设计可以提供5.3美元的面积效率和7.3美元的能量效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Area and Energy-Efficient Systolic Array Accelerator Architecture for Deep Neural Networks Using Stochastic Computing

Deep neural networks (DNNs) are widely used to handle various intelligent tasks. With the increased model size, the DNNs’ hardware accelerators are challenging the higher area overhead and energy consumption. Stochastic computing (SC) has recently been considered for implementing DNNs and reducing hardware consumption. However, many current SC-based DNN accelerators fail to balance accuracy, performance, and resource overhead. In addition, their limited scalability and flexibility restrict their use in edge devices. In this article, we design an area and energy-efficient DNN accelerator architecture using SC. We propose an SC-binary hybrid processing unit with piecewise shift compensation without significant additional hardware overhead increment to improve the SC accuracy. To balance performance and resource overhead, we conduct a design space exploration (DSE) from an overall architectural perspective. An experimental platform with both software and hardware for SC-based DNNs is established. The software simulation results demonstrate that the best accuracy of the designed SC-DNN on the CIFAR-10 is 91.9%, which is 3.2% higher than that of the previous SC-DNN work. The VLSI implementation of the hardware is synthesized using the TSMC 28-nm CMOS process. Results show that compared to the binary computing counterpart, our design achieves

$2.7\times $

area efficiency and

$3.4\times $

energy efficiency. Compared to other SC-DNN accelerator designs, our design can provide

$5.3\times $

area efficiency and

$7.3\times $

energy efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.