Analysis of 32-bit Fixed Point Quantizer in the Wide Variance Range for the Laplacian Source

Z. Perić, A. Jovanovic, M. Dincic, Milan S. Savic, N. Vučić, Anastasija Nikolić
{"title":"Analysis of 32-bit Fixed Point Quantizer in the Wide Variance Range for the Laplacian Source","authors":"Z. Perić, A. Jovanovic, M. Dincic, Milan S. Savic, N. Vučić, Anastasija Nikolić","doi":"10.1109/TELSIKS52058.2021.9606251","DOIUrl":null,"url":null,"abstract":"The main goal of this paper is to examine the possibility of using the 32-bit fixed-point format to represent the weights of neural networks (NN) instead of the standardly used 32-bit floating-point format in order to reduce the complexity of NN implementation. To this end, the performance of the 32-bit fixed-point format is analyzed, using an analogy between the fixed-point format and the uniform quantization that allows for the performance of the 32-bit fixed-point format to be expressed by an objective measure SQNR (Signal-to-Quantization Noise Ratio). In doing so, SQNR analysis is performed in a wide range of variance of NN weights, looking for a solution that maximizes the average SQNR in that range of variance. Also, an experiment is performed, applying the 32-bit fixed-point format to represent the weights of an MLP (Multilayer Perceptron) neural network trained for classification purposes. It is shown that the application of the 32-bit fixed-point representation of MLP weights achieves the same classification accuracy as in the case of the 32-bit floating-point representation of MLP weights, proving that the application of the 32-bit fixed-point representation of weights reduces the implementation complexity of neural networks without compromising the accuracy of classification.","PeriodicalId":228464,"journal":{"name":"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TELSIKS52058.2021.9606251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The main goal of this paper is to examine the possibility of using the 32-bit fixed-point format to represent the weights of neural networks (NN) instead of the standardly used 32-bit floating-point format in order to reduce the complexity of NN implementation. To this end, the performance of the 32-bit fixed-point format is analyzed, using an analogy between the fixed-point format and the uniform quantization that allows for the performance of the 32-bit fixed-point format to be expressed by an objective measure SQNR (Signal-to-Quantization Noise Ratio). In doing so, SQNR analysis is performed in a wide range of variance of NN weights, looking for a solution that maximizes the average SQNR in that range of variance. Also, an experiment is performed, applying the 32-bit fixed-point format to represent the weights of an MLP (Multilayer Perceptron) neural network trained for classification purposes. It is shown that the application of the 32-bit fixed-point representation of MLP weights achieves the same classification accuracy as in the case of the 32-bit floating-point representation of MLP weights, proving that the application of the 32-bit fixed-point representation of weights reduces the implementation complexity of neural networks without compromising the accuracy of classification.
拉普拉斯信号源大方差范围内32位定点量化器分析
本文的主要目标是研究使用32位定点格式来代替标准使用的32位浮点格式来表示神经网络(NN)权重的可能性,以降低神经网络实现的复杂性。为此,对32位定点格式的性能进行了分析,使用了定点格式和均匀量化之间的类比,使得32位定点格式的性能可以用客观度量SQNR(信号量化噪声比)来表示。在此过程中,SQNR分析在神经网络权重的广泛方差范围内执行,寻找在该方差范围内最大化平均SQNR的解决方案。此外,还进行了一个实验,应用32位不动点格式来表示用于分类目的训练的MLP(多层感知器)神经网络的权重。结果表明,采用32位不动点表示MLP权值的分类精度与采用32位浮点表示MLP权值的分类精度相同,证明采用32位不动点表示权值在不影响分类精度的情况下降低了神经网络的实现复杂度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信