Analysis of 32-bit Fixed Point Quantizer in the Wide Variance Range for the Laplacian Source

2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS) Pub Date : 2021-10-20 DOI:10.1109/TELSIKS52058.2021.9606251

Z. Perić, A. Jovanovic, M. Dincic, Milan S. Savic, N. Vučić, Anastasija Nikolić

{"title":"Analysis of 32-bit Fixed Point Quantizer in the Wide Variance Range for the Laplacian Source","authors":"Z. Perić, A. Jovanovic, M. Dincic, Milan S. Savic, N. Vučić, Anastasija Nikolić","doi":"10.1109/TELSIKS52058.2021.9606251","DOIUrl":null,"url":null,"abstract":"The main goal of this paper is to examine the possibility of using the 32-bit fixed-point format to represent the weights of neural networks (NN) instead of the standardly used 32-bit floating-point format in order to reduce the complexity of NN implementation. To this end, the performance of the 32-bit fixed-point format is analyzed, using an analogy between the fixed-point format and the uniform quantization that allows for the performance of the 32-bit fixed-point format to be expressed by an objective measure SQNR (Signal-to-Quantization Noise Ratio). In doing so, SQNR analysis is performed in a wide range of variance of NN weights, looking for a solution that maximizes the average SQNR in that range of variance. Also, an experiment is performed, applying the 32-bit fixed-point format to represent the weights of an MLP (Multilayer Perceptron) neural network trained for classification purposes. It is shown that the application of the 32-bit fixed-point representation of MLP weights achieves the same classification accuracy as in the case of the 32-bit floating-point representation of MLP weights, proving that the application of the 32-bit fixed-point representation of weights reduces the implementation complexity of neural networks without compromising the accuracy of classification.","PeriodicalId":228464,"journal":{"name":"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TELSIKS52058.2021.9606251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The main goal of this paper is to examine the possibility of using the 32-bit fixed-point format to represent the weights of neural networks (NN) instead of the standardly used 32-bit floating-point format in order to reduce the complexity of NN implementation. To this end, the performance of the 32-bit fixed-point format is analyzed, using an analogy between the fixed-point format and the uniform quantization that allows for the performance of the 32-bit fixed-point format to be expressed by an objective measure SQNR (Signal-to-Quantization Noise Ratio). In doing so, SQNR analysis is performed in a wide range of variance of NN weights, looking for a solution that maximizes the average SQNR in that range of variance. Also, an experiment is performed, applying the 32-bit fixed-point format to represent the weights of an MLP (Multilayer Perceptron) neural network trained for classification purposes. It is shown that the application of the 32-bit fixed-point representation of MLP weights achieves the same classification accuracy as in the case of the 32-bit floating-point representation of MLP weights, proving that the application of the 32-bit fixed-point representation of weights reduces the implementation complexity of neural networks without compromising the accuracy of classification.

查看原文本刊更多论文

拉普拉斯信号源大方差范围内32位定点量化器分析

本文的主要目标是研究使用32位定点格式来代替标准使用的32位浮点格式来表示神经网络(NN)权重的可能性，以降低神经网络实现的复杂性。为此，对32位定点格式的性能进行了分析，使用了定点格式和均匀量化之间的类比，使得32位定点格式的性能可以用客观度量SQNR(信号量化噪声比)来表示。在此过程中，SQNR分析在神经网络权重的广泛方差范围内执行，寻找在该方差范围内最大化平均SQNR的解决方案。此外，还进行了一个实验，应用32位不动点格式来表示用于分类目的训练的MLP(多层感知器)神经网络的权重。结果表明，采用32位不动点表示MLP权值的分类精度与采用32位浮点表示MLP权值的分类精度相同，证明采用32位不动点表示权值在不影响分类精度的情况下降低了神经网络的实现复杂度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)

自引率

0.00%

发文量