{"title":"基于在线算法的灵敏度控制逼近减少DNN推理中的数据占用","authors":"Abdus Sami Hassan, Tooba Arifeen, Jeong-A Lee","doi":"10.1109/DSD51259.2020.00089","DOIUrl":null,"url":null,"abstract":"In deep neural network (DNN) inference, researchers have been trying to reduce the number of computations and connections without performance degradation, departing from a bit-parallel to a bit-serial mode of arithmetic. In this regard, approximations translated as the mixed-precision profile for among-layer-mixed-precision through bit-serial architecture have been adopted in the literature. However, the introduction of within-layer mixed precision through controlled approximations for low-latency DNN architecture is yet to be studied. For DNN inference in this study, we apply an unconventional computation technique of online arithmetic, which serially generates the most significant digits first(MSDF) and then terminates computation according to the required precision. Specifically, Taylor expansion-based sensitivity analysis guides the within-layer-mixed-precision method for the choice of approximation intensity (desired bits) for weights and activations of convolutional layers. In turn, the within-layer-mixed-precision method drives the termination of the convolution operation carried out using an online multiplier. Hence, we aim to reduce the data footprint by early terminations achieved thanks to the insightful nature of within-layer-mixed-precision instead of among-layer-mixedprecision for online convolution. In this manner, convolution operations compute not-more-than-necessary most significant digits to overcome the bottleneck of data footprint for in-demand edge computing devices.","PeriodicalId":128527,"journal":{"name":"2020 23rd Euromicro Conference on Digital System Design (DSD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Data Footprint Reduction in DNN Inference by Sensitivity-Controlled Approximations with Online Arithmetic\",\"authors\":\"Abdus Sami Hassan, Tooba Arifeen, Jeong-A Lee\",\"doi\":\"10.1109/DSD51259.2020.00089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In deep neural network (DNN) inference, researchers have been trying to reduce the number of computations and connections without performance degradation, departing from a bit-parallel to a bit-serial mode of arithmetic. In this regard, approximations translated as the mixed-precision profile for among-layer-mixed-precision through bit-serial architecture have been adopted in the literature. However, the introduction of within-layer mixed precision through controlled approximations for low-latency DNN architecture is yet to be studied. For DNN inference in this study, we apply an unconventional computation technique of online arithmetic, which serially generates the most significant digits first(MSDF) and then terminates computation according to the required precision. Specifically, Taylor expansion-based sensitivity analysis guides the within-layer-mixed-precision method for the choice of approximation intensity (desired bits) for weights and activations of convolutional layers. In turn, the within-layer-mixed-precision method drives the termination of the convolution operation carried out using an online multiplier. Hence, we aim to reduce the data footprint by early terminations achieved thanks to the insightful nature of within-layer-mixed-precision instead of among-layer-mixedprecision for online convolution. In this manner, convolution operations compute not-more-than-necessary most significant digits to overcome the bottleneck of data footprint for in-demand edge computing devices.\",\"PeriodicalId\":128527,\"journal\":{\"name\":\"2020 23rd Euromicro Conference on Digital System Design (DSD)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 23rd Euromicro Conference on Digital System Design (DSD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSD51259.2020.00089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD51259.2020.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data Footprint Reduction in DNN Inference by Sensitivity-Controlled Approximations with Online Arithmetic
In deep neural network (DNN) inference, researchers have been trying to reduce the number of computations and connections without performance degradation, departing from a bit-parallel to a bit-serial mode of arithmetic. In this regard, approximations translated as the mixed-precision profile for among-layer-mixed-precision through bit-serial architecture have been adopted in the literature. However, the introduction of within-layer mixed precision through controlled approximations for low-latency DNN architecture is yet to be studied. For DNN inference in this study, we apply an unconventional computation technique of online arithmetic, which serially generates the most significant digits first(MSDF) and then terminates computation according to the required precision. Specifically, Taylor expansion-based sensitivity analysis guides the within-layer-mixed-precision method for the choice of approximation intensity (desired bits) for weights and activations of convolutional layers. In turn, the within-layer-mixed-precision method drives the termination of the convolution operation carried out using an online multiplier. Hence, we aim to reduce the data footprint by early terminations achieved thanks to the insightful nature of within-layer-mixed-precision instead of among-layer-mixedprecision for online convolution. In this manner, convolution operations compute not-more-than-necessary most significant digits to overcome the bottleneck of data footprint for in-demand edge computing devices.