Eunchong Lee, Minkyu Lee, Sanghyun Kim, Soyoung Lee, Sung-Joon Jang, Sang-Seol Lee
{"title":"Optimization Method and Implementation of Fake Quantization from the Perspective of Hardware Performance","authors":"Eunchong Lee, Minkyu Lee, Sanghyun Kim, Soyoung Lee, Sung-Joon Jang, Sang-Seol Lee","doi":"10.1109/ITC-CSCC58803.2023.10212718","DOIUrl":null,"url":null,"abstract":"Deep learning networks can be accelerated by reducing the overall network volume using quantization or pruning techniques. The well-known quantization technique is Post Training Quantization (PTQ) and Quantization Aware Training (QAT). We applied an INT8 quantized network to design deep learning acceleration hardware and found that the performance of the deep learning network deteriorated due to errors occurring in the mult/shift based re-quantization step. This quantization error becomes a bigger problem in the training process rather than inference, and the FP32 arithmetic operator is applied to prevent the resulting accuracy drop. In this paper, we investigate whether the use of FP32 operators can outperform employing mult/shift operators under specific conditions. We accomplish this by analyzing the data flow based on output channel tiling and conducting a size analysis of the implemented hardware.","PeriodicalId":220939,"journal":{"name":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITC-CSCC58803.2023.10212718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning networks can be accelerated by reducing the overall network volume using quantization or pruning techniques. The well-known quantization technique is Post Training Quantization (PTQ) and Quantization Aware Training (QAT). We applied an INT8 quantized network to design deep learning acceleration hardware and found that the performance of the deep learning network deteriorated due to errors occurring in the mult/shift based re-quantization step. This quantization error becomes a bigger problem in the training process rather than inference, and the FP32 arithmetic operator is applied to prevent the resulting accuracy drop. In this paper, we investigate whether the use of FP32 operators can outperform employing mult/shift operators under specific conditions. We accomplish this by analyzing the data flow based on output channel tiling and conducting a size analysis of the implemented hardware.