Data Footprint Reduction in DNN Inference by Sensitivity-Controlled Approximations with Online Arithmetic

2020 23rd Euromicro Conference on Digital System Design (DSD) Pub Date : 2020-08-01 DOI:10.1109/DSD51259.2020.00089

Abdus Sami Hassan, Tooba Arifeen, Jeong-A Lee

{"title":"Data Footprint Reduction in DNN Inference by Sensitivity-Controlled Approximations with Online Arithmetic","authors":"Abdus Sami Hassan, Tooba Arifeen, Jeong-A Lee","doi":"10.1109/DSD51259.2020.00089","DOIUrl":null,"url":null,"abstract":"In deep neural network (DNN) inference, researchers have been trying to reduce the number of computations and connections without performance degradation, departing from a bit-parallel to a bit-serial mode of arithmetic. In this regard, approximations translated as the mixed-precision profile for among-layer-mixed-precision through bit-serial architecture have been adopted in the literature. However, the introduction of within-layer mixed precision through controlled approximations for low-latency DNN architecture is yet to be studied. For DNN inference in this study, we apply an unconventional computation technique of online arithmetic, which serially generates the most significant digits first(MSDF) and then terminates computation according to the required precision. Specifically, Taylor expansion-based sensitivity analysis guides the within-layer-mixed-precision method for the choice of approximation intensity (desired bits) for weights and activations of convolutional layers. In turn, the within-layer-mixed-precision method drives the termination of the convolution operation carried out using an online multiplier. Hence, we aim to reduce the data footprint by early terminations achieved thanks to the insightful nature of within-layer-mixed-precision instead of among-layer-mixedprecision for online convolution. In this manner, convolution operations compute not-more-than-necessary most significant digits to overcome the bottleneck of data footprint for in-demand edge computing devices.","PeriodicalId":128527,"journal":{"name":"2020 23rd Euromicro Conference on Digital System Design (DSD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD51259.2020.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

In deep neural network (DNN) inference, researchers have been trying to reduce the number of computations and connections without performance degradation, departing from a bit-parallel to a bit-serial mode of arithmetic. In this regard, approximations translated as the mixed-precision profile for among-layer-mixed-precision through bit-serial architecture have been adopted in the literature. However, the introduction of within-layer mixed precision through controlled approximations for low-latency DNN architecture is yet to be studied. For DNN inference in this study, we apply an unconventional computation technique of online arithmetic, which serially generates the most significant digits first(MSDF) and then terminates computation according to the required precision. Specifically, Taylor expansion-based sensitivity analysis guides the within-layer-mixed-precision method for the choice of approximation intensity (desired bits) for weights and activations of convolutional layers. In turn, the within-layer-mixed-precision method drives the termination of the convolution operation carried out using an online multiplier. Hence, we aim to reduce the data footprint by early terminations achieved thanks to the insightful nature of within-layer-mixed-precision instead of among-layer-mixedprecision for online convolution. In this manner, convolution operations compute not-more-than-necessary most significant digits to overcome the bottleneck of data footprint for in-demand edge computing devices.

查看原文本刊更多论文

基于在线算法的灵敏度控制逼近减少DNN推理中的数据占用

在深度神经网络(DNN)推理中，研究人员一直试图在不降低性能的情况下减少计算和连接的数量，从位并行到位串行的算法模式。在这方面，文献中采用了通过位-串行体系结构翻译为层间混合精度的混合精度轮廓的近似。然而，通过控制近似引入低延迟DNN架构的层内混合精度还有待研究。对于本研究中的DNN推理，我们采用了一种非常规的在线算法计算技术，该算法首先连续生成最高有效数字(MSDF)，然后根据所需的精度终止计算。具体来说，基于Taylor展开的灵敏度分析指导层内混合精度方法选择卷积层的权重和激活的近似强度(期望比特)。反过来，层内混合精度方法驱动使用在线乘法器执行的卷积操作的终止。因此，我们的目标是通过早期终止来减少数据占用，这要归功于在线卷积的层内混合精度而不是层间混合精度的深刻特性。通过这种方式，卷积运算计算不超过必要的最高有效数字，以克服按需边缘计算设备的数据占用瓶颈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 23rd Euromicro Conference on Digital System Design (DSD)

自引率

0.00%

发文量