{"title":"Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates","authors":"Xuanda Wang, Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, H. Xiong","doi":"10.1109/DCC55655.2023.00075","DOIUrl":null,"url":null,"abstract":"Quantizing one single deep neural network into multiple compression rates (precisions) has been recently considered for flexible deployments in real-world scenarios. In this paper, we propose a novel scheme that achieves progressive bit-width allocation and joint training to simultaneously optimize mixed-precision quantized networks under multiple compression rates. Specifically, we develop a progressive bit-width allocation with switchable quantization step size to enable mixed-precision quantization based on analytic sensitivity of network layers under multiple compression rates. Furthermore, we achieve joint training for quantized networks under different compression rates via knowledge distillation to exploit their correlations based on the shared network structure. Experimental results show that the proposed scheme outperforms AdaBits [1] in various networks on CIFAR-10 and ImageNet.","PeriodicalId":209029,"journal":{"name":"2023 Data Compression Conference (DCC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Data Compression Conference (DCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC55655.2023.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Quantizing one single deep neural network into multiple compression rates (precisions) has been recently considered for flexible deployments in real-world scenarios. In this paper, we propose a novel scheme that achieves progressive bit-width allocation and joint training to simultaneously optimize mixed-precision quantized networks under multiple compression rates. Specifically, we develop a progressive bit-width allocation with switchable quantization step size to enable mixed-precision quantization based on analytic sensitivity of network layers under multiple compression rates. Furthermore, we achieve joint training for quantized networks under different compression rates via knowledge distillation to exploit their correlations based on the shared network structure. Experimental results show that the proposed scheme outperforms AdaBits [1] in various networks on CIFAR-10 and ImageNet.