Z. Perić, A. Jovanovic, M. Dincic, Milan S. Savic, N. Vučić, Anastasija Nikolić
{"title":"拉普拉斯信号源大方差范围内32位定点量化器分析","authors":"Z. Perić, A. Jovanovic, M. Dincic, Milan S. Savic, N. Vučić, Anastasija Nikolić","doi":"10.1109/TELSIKS52058.2021.9606251","DOIUrl":null,"url":null,"abstract":"The main goal of this paper is to examine the possibility of using the 32-bit fixed-point format to represent the weights of neural networks (NN) instead of the standardly used 32-bit floating-point format in order to reduce the complexity of NN implementation. To this end, the performance of the 32-bit fixed-point format is analyzed, using an analogy between the fixed-point format and the uniform quantization that allows for the performance of the 32-bit fixed-point format to be expressed by an objective measure SQNR (Signal-to-Quantization Noise Ratio). In doing so, SQNR analysis is performed in a wide range of variance of NN weights, looking for a solution that maximizes the average SQNR in that range of variance. Also, an experiment is performed, applying the 32-bit fixed-point format to represent the weights of an MLP (Multilayer Perceptron) neural network trained for classification purposes. It is shown that the application of the 32-bit fixed-point representation of MLP weights achieves the same classification accuracy as in the case of the 32-bit floating-point representation of MLP weights, proving that the application of the 32-bit fixed-point representation of weights reduces the implementation complexity of neural networks without compromising the accuracy of classification.","PeriodicalId":228464,"journal":{"name":"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Analysis of 32-bit Fixed Point Quantizer in the Wide Variance Range for the Laplacian Source\",\"authors\":\"Z. Perić, A. Jovanovic, M. Dincic, Milan S. Savic, N. Vučić, Anastasija Nikolić\",\"doi\":\"10.1109/TELSIKS52058.2021.9606251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main goal of this paper is to examine the possibility of using the 32-bit fixed-point format to represent the weights of neural networks (NN) instead of the standardly used 32-bit floating-point format in order to reduce the complexity of NN implementation. To this end, the performance of the 32-bit fixed-point format is analyzed, using an analogy between the fixed-point format and the uniform quantization that allows for the performance of the 32-bit fixed-point format to be expressed by an objective measure SQNR (Signal-to-Quantization Noise Ratio). In doing so, SQNR analysis is performed in a wide range of variance of NN weights, looking for a solution that maximizes the average SQNR in that range of variance. Also, an experiment is performed, applying the 32-bit fixed-point format to represent the weights of an MLP (Multilayer Perceptron) neural network trained for classification purposes. It is shown that the application of the 32-bit fixed-point representation of MLP weights achieves the same classification accuracy as in the case of the 32-bit floating-point representation of MLP weights, proving that the application of the 32-bit fixed-point representation of weights reduces the implementation complexity of neural networks without compromising the accuracy of classification.\",\"PeriodicalId\":228464,\"journal\":{\"name\":\"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TELSIKS52058.2021.9606251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 15th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TELSIKS52058.2021.9606251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of 32-bit Fixed Point Quantizer in the Wide Variance Range for the Laplacian Source
The main goal of this paper is to examine the possibility of using the 32-bit fixed-point format to represent the weights of neural networks (NN) instead of the standardly used 32-bit floating-point format in order to reduce the complexity of NN implementation. To this end, the performance of the 32-bit fixed-point format is analyzed, using an analogy between the fixed-point format and the uniform quantization that allows for the performance of the 32-bit fixed-point format to be expressed by an objective measure SQNR (Signal-to-Quantization Noise Ratio). In doing so, SQNR analysis is performed in a wide range of variance of NN weights, looking for a solution that maximizes the average SQNR in that range of variance. Also, an experiment is performed, applying the 32-bit fixed-point format to represent the weights of an MLP (Multilayer Perceptron) neural network trained for classification purposes. It is shown that the application of the 32-bit fixed-point representation of MLP weights achieves the same classification accuracy as in the case of the 32-bit floating-point representation of MLP weights, proving that the application of the 32-bit fixed-point representation of weights reduces the implementation complexity of neural networks without compromising the accuracy of classification.