{"title":"基于随机计算的深度神经网络收缩阵列加速器结构","authors":"Jingwei Zhu;Jingguo Wu;Zongru Yang;Yu Jiang;Yun Chen","doi":"10.1109/TVLSI.2025.3550786","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are widely used to handle various intelligent tasks. With the increased model size, the DNNs’ hardware accelerators are challenging the higher area overhead and energy consumption. Stochastic computing (SC) has recently been considered for implementing DNNs and reducing hardware consumption. However, many current SC-based DNN accelerators fail to balance accuracy, performance, and resource overhead. In addition, their limited scalability and flexibility restrict their use in edge devices. In this article, we design an area and energy-efficient DNN accelerator architecture using SC. We propose an SC-binary hybrid processing unit with piecewise shift compensation without significant additional hardware overhead increment to improve the SC accuracy. To balance performance and resource overhead, we conduct a design space exploration (DSE) from an overall architectural perspective. An experimental platform with both software and hardware for SC-based DNNs is established. The software simulation results demonstrate that the best accuracy of the designed SC-DNN on the CIFAR-10 is 91.9%, which is 3.2% higher than that of the previous SC-DNN work. The VLSI implementation of the hardware is synthesized using the TSMC 28-nm CMOS process. Results show that compared to the binary computing counterpart, our design achieves <inline-formula> <tex-math>$2.7\\times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$3.4\\times $ </tex-math></inline-formula> energy efficiency. Compared to other SC-DNN accelerator designs, our design can provide <inline-formula> <tex-math>$5.3\\times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$7.3\\times $ </tex-math></inline-formula> energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1582-1595"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Area and Energy-Efficient Systolic Array Accelerator Architecture for Deep Neural Networks Using Stochastic Computing\",\"authors\":\"Jingwei Zhu;Jingguo Wu;Zongru Yang;Yu Jiang;Yun Chen\",\"doi\":\"10.1109/TVLSI.2025.3550786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) are widely used to handle various intelligent tasks. With the increased model size, the DNNs’ hardware accelerators are challenging the higher area overhead and energy consumption. Stochastic computing (SC) has recently been considered for implementing DNNs and reducing hardware consumption. However, many current SC-based DNN accelerators fail to balance accuracy, performance, and resource overhead. In addition, their limited scalability and flexibility restrict their use in edge devices. In this article, we design an area and energy-efficient DNN accelerator architecture using SC. We propose an SC-binary hybrid processing unit with piecewise shift compensation without significant additional hardware overhead increment to improve the SC accuracy. To balance performance and resource overhead, we conduct a design space exploration (DSE) from an overall architectural perspective. An experimental platform with both software and hardware for SC-based DNNs is established. The software simulation results demonstrate that the best accuracy of the designed SC-DNN on the CIFAR-10 is 91.9%, which is 3.2% higher than that of the previous SC-DNN work. The VLSI implementation of the hardware is synthesized using the TSMC 28-nm CMOS process. Results show that compared to the binary computing counterpart, our design achieves <inline-formula> <tex-math>$2.7\\\\times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$3.4\\\\times $ </tex-math></inline-formula> energy efficiency. Compared to other SC-DNN accelerator designs, our design can provide <inline-formula> <tex-math>$5.3\\\\times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$7.3\\\\times $ </tex-math></inline-formula> energy efficiency.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 6\",\"pages\":\"1582-1595\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10937936/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937936/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
An Area and Energy-Efficient Systolic Array Accelerator Architecture for Deep Neural Networks Using Stochastic Computing
Deep neural networks (DNNs) are widely used to handle various intelligent tasks. With the increased model size, the DNNs’ hardware accelerators are challenging the higher area overhead and energy consumption. Stochastic computing (SC) has recently been considered for implementing DNNs and reducing hardware consumption. However, many current SC-based DNN accelerators fail to balance accuracy, performance, and resource overhead. In addition, their limited scalability and flexibility restrict their use in edge devices. In this article, we design an area and energy-efficient DNN accelerator architecture using SC. We propose an SC-binary hybrid processing unit with piecewise shift compensation without significant additional hardware overhead increment to improve the SC accuracy. To balance performance and resource overhead, we conduct a design space exploration (DSE) from an overall architectural perspective. An experimental platform with both software and hardware for SC-based DNNs is established. The software simulation results demonstrate that the best accuracy of the designed SC-DNN on the CIFAR-10 is 91.9%, which is 3.2% higher than that of the previous SC-DNN work. The VLSI implementation of the hardware is synthesized using the TSMC 28-nm CMOS process. Results show that compared to the binary computing counterpart, our design achieves $2.7\times $ area efficiency and $3.4\times $ energy efficiency. Compared to other SC-DNN accelerator designs, our design can provide $5.3\times $ area efficiency and $7.3\times $ energy efficiency.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.