通过重复使用提高DNN的GEMM加速器的能效

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-02-24 DOI:10.1145/3503469

Nihat Mert Cicek, Xipeng Shen, O. Ozturk

{"title":"通过重复使用提高DNN的GEMM加速器的能效","authors":"Nihat Mert Cicek, Xipeng Shen, O. Ozturk","doi":"10.1145/3503469","DOIUrl":null,"url":null,"abstract":"Reuse-centric convolutional neural networks (CNN) acceleration speeds up CNN inference by reusing computations for similar neuron vectors in CNN’s input layer or activation maps. This new paradigm of optimizations is, however, largely limited by the overheads in neuron vector similarity detection, an important step in reuse-centric CNN. This article presents an in-depth exploration of architectural support for reuse-centric CNN. It addresses some major limitations of the state-of-the-art design and proposes a novel hardware accelerator that improves neuron vector similarity detection and reduces the energy consumption of reuse-centric CNN inference. The accelerator is implemented to support a wide variety of neural network settings with a banked memory subsystem. Design exploration is performed through RTL simulation and synthesis on an FPGA platform. When integrated into Eyeriss, the accelerator can potentially provide improvements up to 7.75 \\( \\times \\) in performance. Furthermore, it can reduce the energy used for similarity detection up to 95.46%, and it can accelerate the convolutional layer up to 3.63 \\( \\times \\) compared to the software-based implementation running on the CPU.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"29 1","pages":"1 - 26"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse\",\"authors\":\"Nihat Mert Cicek, Xipeng Shen, O. Ozturk\",\"doi\":\"10.1145/3503469\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reuse-centric convolutional neural networks (CNN) acceleration speeds up CNN inference by reusing computations for similar neuron vectors in CNN’s input layer or activation maps. This new paradigm of optimizations is, however, largely limited by the overheads in neuron vector similarity detection, an important step in reuse-centric CNN. This article presents an in-depth exploration of architectural support for reuse-centric CNN. It addresses some major limitations of the state-of-the-art design and proposes a novel hardware accelerator that improves neuron vector similarity detection and reduces the energy consumption of reuse-centric CNN inference. The accelerator is implemented to support a wide variety of neural network settings with a banked memory subsystem. Design exploration is performed through RTL simulation and synthesis on an FPGA platform. When integrated into Eyeriss, the accelerator can potentially provide improvements up to 7.75 \\\\( \\\\times \\\\) in performance. Furthermore, it can reduce the energy used for similarity detection up to 95.46%, and it can accelerate the convolutional layer up to 3.63 \\\\( \\\\times \\\\) compared to the software-based implementation running on the CPU.\",\"PeriodicalId\":6933,\"journal\":{\"name\":\"ACM Transactions on Design Automation of Electronic Systems (TODAES)\",\"volume\":\"29 1\",\"pages\":\"1 - 26\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Design Automation of Electronic Systems (TODAES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3503469\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

以重用为中心的卷积神经网络(CNN)通过重用CNN输入层或激活图中相似神经元向量的计算来加速CNN推理。然而，这种新的优化范例在很大程度上受到神经元向量相似性检测的开销的限制，这是以重用为中心的CNN的重要一步。本文将深入探讨以重用为中心的CNN的体系结构支持。它解决了最先进设计的一些主要限制，并提出了一种新的硬件加速器，可以改进神经元向量相似性检测并降低以重用为中心的CNN推理的能耗。加速器的实现是为了支持各种各样的神经网络设置与存储子系统。在FPGA平台上通过RTL仿真和综合进行设计探索。当集成到Eyeriss时，加速器可能会提供高达7.75 \( \times \)的性能改进。此外，它可以减少相似性检测的能量高达95.46%, and it can accelerate the convolutional layer up to 3.63 \( \times \) compared to the software-based implementation running on the CPU.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse

Reuse-centric convolutional neural networks (CNN) acceleration speeds up CNN inference by reusing computations for similar neuron vectors in CNN’s input layer or activation maps. This new paradigm of optimizations is, however, largely limited by the overheads in neuron vector similarity detection, an important step in reuse-centric CNN. This article presents an in-depth exploration of architectural support for reuse-centric CNN. It addresses some major limitations of the state-of-the-art design and proposes a novel hardware accelerator that improves neuron vector similarity detection and reduces the energy consumption of reuse-centric CNN inference. The accelerator is implemented to support a wide variety of neural network settings with a banked memory subsystem. Design exploration is performed through RTL simulation and synthesis on an FPGA platform. When integrated into Eyeriss, the accelerator can potentially provide improvements up to 7.75 \( \times \) in performance. Furthermore, it can reduce the energy used for similarity detection up to 95.46%, and it can accelerate the convolutional layer up to 3.63 \( \times \) compared to the software-based implementation running on the CPU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Design Automation of Electronic Systems (TODAES)

自引率

0.00%

发文量