Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network Acceleration

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2020-11-25 DOI:10.1145/3492733

Reena Elangovan, Shubham Jain, A. Raghunathan

{"title":"Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network Acceleration","authors":"Reena Elangovan, Shubham Jain, A. Raghunathan","doi":"10.1145/3492733","DOIUrl":null,"url":null,"abstract":"Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs). Efforts toward creating ultra-low-precision (sub-8-bit) DNNs for efficient inference suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks, and even across layers within a network. This translates to a need to support variable precision computation in DNN hardware. Previous proposals for precision-reconfigurable hardware, such as bit-serial architectures, incur high overheads, significantly diminishing the benefits of lower precision. We propose Ax-BxP, a method for approximate blocked computation wherein each multiply-accumulate operation is performed block-wise (a block is a group of bits), facilitating re-configurability at the granularity of blocks. Further, approximations are introduced by only performing a subset of the required block-wise computations to realize precision re-configurability with high efficiency. We design a DNN accelerator that embodies approximate blocked computation and propose a method to determine a suitable approximation configuration for any given DNN. For the AlexNet, ResNet50, and MobileNetV2 DNNs, Ax-BxP achieves improvement in system energy and performance, respectively, over an 8-bit fixed-point (FxP8) baseline, with minimal loss (<1% on average) in classification accuracy. Further, by varying the approximation configurations at a finer granularity across layers and data-structures within a DNN, we achieve improvement in system energy and performance, respectively.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"7 1","pages":"1 - 20"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3492733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs). Efforts toward creating ultra-low-precision (sub-8-bit) DNNs for efficient inference suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks, and even across layers within a network. This translates to a need to support variable precision computation in DNN hardware. Previous proposals for precision-reconfigurable hardware, such as bit-serial architectures, incur high overheads, significantly diminishing the benefits of lower precision. We propose Ax-BxP, a method for approximate blocked computation wherein each multiply-accumulate operation is performed block-wise (a block is a group of bits), facilitating re-configurability at the granularity of blocks. Further, approximations are introduced by only performing a subset of the required block-wise computations to realize precision re-configurability with high efficiency. We design a DNN accelerator that embodies approximate blocked computation and propose a method to determine a suitable approximation configuration for any given DNN. For the AlexNet, ResNet50, and MobileNetV2 DNNs, Ax-BxP achieves improvement in system energy and performance, respectively, over an 8-bit fixed-point (FxP8) baseline, with minimal loss (<1% on average) in classification accuracy. Further, by varying the approximation configurations at a finer granularity across layers and data-structures within a DNN, we achieve improvement in system energy and performance, respectively.

查看原文本刊更多论文

高精度可重构深度神经网络加速的近似块计算

精确缩放已经成为优化深度神经网络(dnn)计算和存储需求的一种流行技术。为高效推理而创建超低精度(8位以下)dnn的努力表明，实现给定网络级精度所需的最低精度在不同的网络中差异很大，甚至在网络内的不同层之间也是如此。这意味着需要在DNN硬件中支持可变精度计算。先前提出的精度可重构硬件，如位串行架构，会产生很高的开销，大大降低了低精度的好处。我们提出Ax-BxP，一种近似块计算方法，其中每个乘法累积操作都是按块执行的(块是一组位)，便于在块粒度上进行可重构。此外，通过只执行所需块计算的子集来引入近似，以实现高精度的可重构性和高效率。我们设计了一个包含近似块计算的深度神经网络加速器，并提出了一种确定任意给定深度神经网络的合适近似配置的方法。对于AlexNet、ResNet50和MobileNetV2 dnn, Ax-BxP分别在8位固点(FxP8)基线上实现了系统能量和性能的改进，分类精度的损失最小(平均<1%)。此外，通过在DNN内跨层和数据结构的更细粒度上改变近似配置，我们分别实现了系统能量和性能的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Design Automation of Electronic Systems (TODAES)

自引率

0.00%

发文量