C-DNN V2: Complementary Deep-Neural-Network Processor With Full-Adder/OR-Based Reduction Tree and Reconfigurable Spatial Weight Reuse

IF 3.7 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2023-10-04 DOI:10.1109/JETCAS.2023.3321771

Sangyeob Kim;Hoi-Jun Yoo

{"title":"C-DNN V2: Complementary Deep-Neural-Network Processor With Full-Adder/OR-Based Reduction Tree and Reconfigurable Spatial Weight Reuse","authors":"Sangyeob Kim;Hoi-Jun Yoo","doi":"10.1109/JETCAS.2023.3321771","DOIUrl":null,"url":null,"abstract":"In this article, we propose a Complementary Deep-Neural-Network (C-DNN) processor V2 by optimizing the performance improvement from combination of CNN and SNN. C-DNN V1 showcased the potential for achieving higher energy efficiency by combining CNN and SNN. However, it encountered 5 challenges that hindered the full realization of this potential: Inefficiency of the clock gating accumulator, imbalance in spike sparsity across different time-steps, redundant cache power stemming from temporal weight reuse, limited performance of the SNN core for dense spike trains, and nonoptimal operation resulting from tile-based workload division. To overcome these challenges and achieve enhanced energy efficiency through the CNN-SNN combination, C-DNN V2 is developed. It addresses these challenges by implementing a Full-Adder/OR-based reduction tree, which reduces power consumption in the SNN core under high spike sparsity conditions. Additionally, it efficiently manages spike sparsity imbalances between dense and sparse SNN cores by integrating them simultaneously. The proposed reconfigurable spatial weight reuse method decreases the number of redundant register files and their power consumption. The spike flipping and inhibition method facilitate efficient processing of input data with high spike sparsity in the SNN core. Furthermore, fine-grained workload division and a high sparsity-aware CNN core are introduced to ensure optimal processing of each data in the domain with the highest energy efficiency. In conclusion, we propose the C-DNN V2 as an optimal complementary DNN processor, delivering 76.9% accuracy for ImageNet classification with a state-of-the-art energy efficiency of 32.8 TOPS/W.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1026-1039"},"PeriodicalIF":3.7000,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10271336/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

In this article, we propose a Complementary Deep-Neural-Network (C-DNN) processor V2 by optimizing the performance improvement from combination of CNN and SNN. C-DNN V1 showcased the potential for achieving higher energy efficiency by combining CNN and SNN. However, it encountered 5 challenges that hindered the full realization of this potential: Inefficiency of the clock gating accumulator, imbalance in spike sparsity across different time-steps, redundant cache power stemming from temporal weight reuse, limited performance of the SNN core for dense spike trains, and nonoptimal operation resulting from tile-based workload division. To overcome these challenges and achieve enhanced energy efficiency through the CNN-SNN combination, C-DNN V2 is developed. It addresses these challenges by implementing a Full-Adder/OR-based reduction tree, which reduces power consumption in the SNN core under high spike sparsity conditions. Additionally, it efficiently manages spike sparsity imbalances between dense and sparse SNN cores by integrating them simultaneously. The proposed reconfigurable spatial weight reuse method decreases the number of redundant register files and their power consumption. The spike flipping and inhibition method facilitate efficient processing of input data with high spike sparsity in the SNN core. Furthermore, fine-grained workload division and a high sparsity-aware CNN core are introduced to ensure optimal processing of each data in the domain with the highest energy efficiency. In conclusion, we propose the C-DNN V2 as an optimal complementary DNN processor, delivering 76.9% accuracy for ImageNet classification with a state-of-the-art energy efficiency of 32.8 TOPS/W.

查看原文本刊更多论文

C-DNN V2：基于全加法器/OR 的还原树和可重构空间权重复用的互补式深度神经网络处理器

在本文中，我们提出了互补深度神经网络（C-DNN）处理器 V2，通过优化 CNN 和 SNN 的组合来提高性能。C-DNN V1 展示了通过结合 CNN 和 SNN 实现更高能效的潜力。然而，它遇到了 5 项挑战，阻碍了这一潜力的充分发挥：时钟门控累加器效率低下、不同时间步的尖峰稀疏性不平衡、时间权重重用导致的冗余缓存功耗、SNN 内核在处理密集尖峰列车时性能有限，以及基于瓦片的工作负载划分导致的非最佳运行。为了克服这些挑战，并通过 CNN-SNN 组合实现更高的能效，我们开发了 C-DNN V2。它通过实施基于全加法器/OR 的还原树来应对这些挑战，从而在高尖峰稀疏性条件下降低 SNN 内核的功耗。此外，它还通过同时集成密集和稀疏 SNN 内核，有效地管理了它们之间的尖峰稀疏不平衡。所提出的可重构空间权重重用方法减少了冗余寄存器文件的数量及其功耗。尖峰翻转和抑制方法有助于在 SNN 内核中高效处理具有高尖峰稀疏性的输入数据。此外，我们还引入了细粒度工作负载划分和高稀疏感知 CNN 内核，以确保以最高能效优化处理域中的每个数据。总之，我们提出的 C-DNN V2 是一种最佳的互补 DNN 处理器，可为 ImageNet 分类提供 76.9% 的准确率，能效达到最先进的 32.8 TOPS/W。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ENGINEERING, ELECTRICAL & ELECTRONIC-

CiteScore

8.50

自引率

2.20%

发文量

期刊介绍： The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.