Z. Perić, A. Jovanovic, M. Dincic, Milan S. Savic, N. Vučić, Anastasija Nikolić
{"title":"Analysis of 32-bit Fixed Point Quantizer in the Wide Variance Range for the Laplacian Source","authors":"Z. Perić, A. Jovanovic, M. Dincic, Milan S. Savic, N. Vučić, Anastasija Nikolić","doi":"10.1109/TELSIKS52058.2021.9606251","DOIUrl":null,"url":null,"abstract":"The main goal of this paper is to examine the possibility of using the 32-bit fixed-point format to represent the weights of neural networks (NN) instead of the standardly used 32-bit floating-point format in order to reduce the complexity of NN implementation. To this end, the performance of the 32-bit fixed-point format is analyzed, using an analogy between the fixed-point format and the uniform quantization that allows for the performance of the 32-bit fixed-point format to be expressed by an objective measure SQNR (Signal-to-Quantization Noise Ratio). In doing so, SQNR analysis is performed in a wide range of variance of NN weights, looking for a solution that maximizes the average SQNR in that range of variance. Also, an experiment is performed, applying the 32-bit fixed-point format to represent the weights of an MLP (Multilayer Perceptron) neural network trained for classification purposes. It is shown that the application of the 32-bit fixed-point representation of MLP weights achieves the same classification accuracy as in the case of the 32-bit floating-point representation of MLP weights, proving that the application of the 32-bit fixed-point representation of weights reduces the implementation complexity of neural networks without compromising the accuracy of classification.","PeriodicalId":228464,"journal":{"name":"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TELSIKS52058.2021.9606251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The main goal of this paper is to examine the possibility of using the 32-bit fixed-point format to represent the weights of neural networks (NN) instead of the standardly used 32-bit floating-point format in order to reduce the complexity of NN implementation. To this end, the performance of the 32-bit fixed-point format is analyzed, using an analogy between the fixed-point format and the uniform quantization that allows for the performance of the 32-bit fixed-point format to be expressed by an objective measure SQNR (Signal-to-Quantization Noise Ratio). In doing so, SQNR analysis is performed in a wide range of variance of NN weights, looking for a solution that maximizes the average SQNR in that range of variance. Also, an experiment is performed, applying the 32-bit fixed-point format to represent the weights of an MLP (Multilayer Perceptron) neural network trained for classification purposes. It is shown that the application of the 32-bit fixed-point representation of MLP weights achieves the same classification accuracy as in the case of the 32-bit floating-point representation of MLP weights, proving that the application of the 32-bit fixed-point representation of weights reduces the implementation complexity of neural networks without compromising the accuracy of classification.