Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates

2023 Data Compression Conference (DCC) Pub Date : 2023-03-01 DOI:10.1109/DCC55655.2023.00075

Xuanda Wang, Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, H. Xiong

引用次数: 0

Abstract

Quantizing one single deep neural network into multiple compression rates (precisions) has been recently considered for flexible deployments in real-world scenarios. In this paper, we propose a novel scheme that achieves progressive bit-width allocation and joint training to simultaneously optimize mixed-precision quantized networks under multiple compression rates. Specifically, we develop a progressive bit-width allocation with switchable quantization step size to enable mixed-precision quantization based on analytic sensitivity of network layers under multiple compression rates. Furthermore, we achieve joint training for quantized networks under different compression rates via knowledge distillation to exploit their correlations based on the shared network structure. Experimental results show that the proposed scheme outperforms AdaBits [1] in various networks on CIFAR-10 and ImageNet.

查看原文本刊更多论文

多重压缩率的混合精度深度神经网络量化

将单个深度神经网络量化为多个压缩率(精度)最近被考虑用于现实场景中的灵活部署。在本文中，我们提出了一种新的方案，实现渐进式位宽分配和联合训练，以同时优化多种压缩率下的混合精度量化网络。具体来说，我们开发了一种渐进式位宽分配，具有可切换的量化步长，以实现基于多种压缩率下网络层的分析灵敏度的混合精度量化。在此基础上，基于共享网络结构，通过知识蒸馏对不同压缩率下的量化网络进行联合训练，挖掘其相关性。实验结果表明，该方案在CIFAR-10和ImageNet上的各种网络中都优于AdaBits[1]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 Data Compression Conference (DCC)

自引率

0.00%

发文量