C-DNN V2:基于全加法器/OR 的还原树和可重构空间权重复用的互补式深度神经网络处理器

IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Sangyeob Kim;Hoi-Jun Yoo
{"title":"C-DNN V2:基于全加法器/OR 的还原树和可重构空间权重复用的互补式深度神经网络处理器","authors":"Sangyeob Kim;Hoi-Jun Yoo","doi":"10.1109/JETCAS.2023.3321771","DOIUrl":null,"url":null,"abstract":"In this article, we propose a Complementary Deep-Neural-Network (C-DNN) processor V2 by optimizing the performance improvement from combination of CNN and SNN. C-DNN V1 showcased the potential for achieving higher energy efficiency by combining CNN and SNN. However, it encountered 5 challenges that hindered the full realization of this potential: Inefficiency of the clock gating accumulator, imbalance in spike sparsity across different time-steps, redundant cache power stemming from temporal weight reuse, limited performance of the SNN core for dense spike trains, and nonoptimal operation resulting from tile-based workload division. To overcome these challenges and achieve enhanced energy efficiency through the CNN-SNN combination, C-DNN V2 is developed. It addresses these challenges by implementing a Full-Adder/OR-based reduction tree, which reduces power consumption in the SNN core under high spike sparsity conditions. Additionally, it efficiently manages spike sparsity imbalances between dense and sparse SNN cores by integrating them simultaneously. The proposed reconfigurable spatial weight reuse method decreases the number of redundant register files and their power consumption. The spike flipping and inhibition method facilitate efficient processing of input data with high spike sparsity in the SNN core. Furthermore, fine-grained workload division and a high sparsity-aware CNN core are introduced to ensure optimal processing of each data in the domain with the highest energy efficiency. In conclusion, we propose the C-DNN V2 as an optimal complementary DNN processor, delivering 76.9% accuracy for ImageNet classification with a state-of-the-art energy efficiency of 32.8 TOPS/W.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1026-1039"},"PeriodicalIF":3.7000,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"C-DNN V2: Complementary Deep-Neural-Network Processor With Full-Adder/OR-Based Reduction Tree and Reconfigurable Spatial Weight Reuse\",\"authors\":\"Sangyeob Kim;Hoi-Jun Yoo\",\"doi\":\"10.1109/JETCAS.2023.3321771\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, we propose a Complementary Deep-Neural-Network (C-DNN) processor V2 by optimizing the performance improvement from combination of CNN and SNN. C-DNN V1 showcased the potential for achieving higher energy efficiency by combining CNN and SNN. However, it encountered 5 challenges that hindered the full realization of this potential: Inefficiency of the clock gating accumulator, imbalance in spike sparsity across different time-steps, redundant cache power stemming from temporal weight reuse, limited performance of the SNN core for dense spike trains, and nonoptimal operation resulting from tile-based workload division. To overcome these challenges and achieve enhanced energy efficiency through the CNN-SNN combination, C-DNN V2 is developed. It addresses these challenges by implementing a Full-Adder/OR-based reduction tree, which reduces power consumption in the SNN core under high spike sparsity conditions. Additionally, it efficiently manages spike sparsity imbalances between dense and sparse SNN cores by integrating them simultaneously. The proposed reconfigurable spatial weight reuse method decreases the number of redundant register files and their power consumption. The spike flipping and inhibition method facilitate efficient processing of input data with high spike sparsity in the SNN core. Furthermore, fine-grained workload division and a high sparsity-aware CNN core are introduced to ensure optimal processing of each data in the domain with the highest energy efficiency. In conclusion, we propose the C-DNN V2 as an optimal complementary DNN processor, delivering 76.9% accuracy for ImageNet classification with a state-of-the-art energy efficiency of 32.8 TOPS/W.\",\"PeriodicalId\":48827,\"journal\":{\"name\":\"IEEE Journal on Emerging and Selected Topics in Circuits and Systems\",\"volume\":\"13 4\",\"pages\":\"1026-1039\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2023-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Emerging and Selected Topics in Circuits and Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10271336/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10271336/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们提出了互补深度神经网络(C-DNN)处理器 V2,通过优化 CNN 和 SNN 的组合来提高性能。C-DNN V1 展示了通过结合 CNN 和 SNN 实现更高能效的潜力。然而,它遇到了 5 项挑战,阻碍了这一潜力的充分发挥:时钟门控累加器效率低下、不同时间步的尖峰稀疏性不平衡、时间权重重用导致的冗余缓存功耗、SNN 内核在处理密集尖峰列车时性能有限,以及基于瓦片的工作负载划分导致的非最佳运行。为了克服这些挑战,并通过 CNN-SNN 组合实现更高的能效,我们开发了 C-DNN V2。它通过实施基于全加法器/OR 的还原树来应对这些挑战,从而在高尖峰稀疏性条件下降低 SNN 内核的功耗。此外,它还通过同时集成密集和稀疏 SNN 内核,有效地管理了它们之间的尖峰稀疏不平衡。所提出的可重构空间权重重用方法减少了冗余寄存器文件的数量及其功耗。尖峰翻转和抑制方法有助于在 SNN 内核中高效处理具有高尖峰稀疏性的输入数据。此外,我们还引入了细粒度工作负载划分和高稀疏感知 CNN 内核,以确保以最高能效优化处理域中的每个数据。总之,我们提出的 C-DNN V2 是一种最佳的互补 DNN 处理器,可为 ImageNet 分类提供 76.9% 的准确率,能效达到最先进的 32.8 TOPS/W。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
C-DNN V2: Complementary Deep-Neural-Network Processor With Full-Adder/OR-Based Reduction Tree and Reconfigurable Spatial Weight Reuse
In this article, we propose a Complementary Deep-Neural-Network (C-DNN) processor V2 by optimizing the performance improvement from combination of CNN and SNN. C-DNN V1 showcased the potential for achieving higher energy efficiency by combining CNN and SNN. However, it encountered 5 challenges that hindered the full realization of this potential: Inefficiency of the clock gating accumulator, imbalance in spike sparsity across different time-steps, redundant cache power stemming from temporal weight reuse, limited performance of the SNN core for dense spike trains, and nonoptimal operation resulting from tile-based workload division. To overcome these challenges and achieve enhanced energy efficiency through the CNN-SNN combination, C-DNN V2 is developed. It addresses these challenges by implementing a Full-Adder/OR-based reduction tree, which reduces power consumption in the SNN core under high spike sparsity conditions. Additionally, it efficiently manages spike sparsity imbalances between dense and sparse SNN cores by integrating them simultaneously. The proposed reconfigurable spatial weight reuse method decreases the number of redundant register files and their power consumption. The spike flipping and inhibition method facilitate efficient processing of input data with high spike sparsity in the SNN core. Furthermore, fine-grained workload division and a high sparsity-aware CNN core are introduced to ensure optimal processing of each data in the domain with the highest energy efficiency. In conclusion, we propose the C-DNN V2 as an optimal complementary DNN processor, delivering 76.9% accuracy for ImageNet classification with a state-of-the-art energy efficiency of 32.8 TOPS/W.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.50
自引率
2.20%
发文量
86
期刊介绍: The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信