低比特量化神经网络同步权量化压缩中的不平衡编码

2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT) Pub Date : 2021-12-12 DOI:10.1109/gcaiot53516.2021.9693045

Yuzhong Jiao, Sha Li, Peng Luo, Xiao Huo, Yiu Kei Li

{"title":"低比特量化神经网络同步权量化压缩中的不平衡编码","authors":"Yuzhong Jiao, Sha Li, Peng Luo, Xiao Huo, Yiu Kei Li","doi":"10.1109/gcaiot53516.2021.9693045","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) usually have thousands of trainable parameters to ensure high accuracy. Due to large amounts of computation and memory requirements, these networks are not suitable for real-time and resource-constrained systems. Various techniques such as network pruning, weight sharing, network quantization, and weight encoding have improved computational and memory efficiency. The synchronous weight quantization-compression (SWQC) technique applies both network quantization and weight encoding to realize weight compression in the process of network quantization. This technique generates a quantized neural network (QNN) model with a good trade-off between accuracy and compression rate by choosing the proper group size, retraining epoch number, and weight threshold. To further improve the compression rate of SWQC, a new strategy for weight encoding, unbalanced encoding, is proposed in this paper. This strategy is able to compress one or multiple quantized weights into one bit, thereby achieving a higher compression rate. Experiments are performed on a 4-bit QNN using the CIFAR10 dataset. The results show that unbalanced encoding achieves a higher compression rate for the layers with large-quantity parameters. By using mixed encoding which combines balanced and unbalanced encoding in different layers can achieve a higher compression rate than using one of them only. In the experiments with CIFAR10, unbalanced encoding gets the compression rate of over 13X in the fully connected layer. By comparison, the compression rate of SWQC with the incorporation of unbalanced encoding achieves more than 5X higher than using balanced encoding only.","PeriodicalId":169247,"journal":{"name":"2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Unbalanced Encoding in Synchronous Weight Quantization-Compression for Low-Bit Quantized Neural Network\",\"authors\":\"Yuzhong Jiao, Sha Li, Peng Luo, Xiao Huo, Yiu Kei Li\",\"doi\":\"10.1109/gcaiot53516.2021.9693045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) usually have thousands of trainable parameters to ensure high accuracy. Due to large amounts of computation and memory requirements, these networks are not suitable for real-time and resource-constrained systems. Various techniques such as network pruning, weight sharing, network quantization, and weight encoding have improved computational and memory efficiency. The synchronous weight quantization-compression (SWQC) technique applies both network quantization and weight encoding to realize weight compression in the process of network quantization. This technique generates a quantized neural network (QNN) model with a good trade-off between accuracy and compression rate by choosing the proper group size, retraining epoch number, and weight threshold. To further improve the compression rate of SWQC, a new strategy for weight encoding, unbalanced encoding, is proposed in this paper. This strategy is able to compress one or multiple quantized weights into one bit, thereby achieving a higher compression rate. Experiments are performed on a 4-bit QNN using the CIFAR10 dataset. The results show that unbalanced encoding achieves a higher compression rate for the layers with large-quantity parameters. By using mixed encoding which combines balanced and unbalanced encoding in different layers can achieve a higher compression rate than using one of them only. In the experiments with CIFAR10, unbalanced encoding gets the compression rate of over 13X in the fully connected layer. By comparison, the compression rate of SWQC with the incorporation of unbalanced encoding achieves more than 5X higher than using balanced encoding only.\",\"PeriodicalId\":169247,\"journal\":{\"name\":\"2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)\",\"volume\":\"95 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/gcaiot53516.2021.9693045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/gcaiot53516.2021.9693045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

深度神经网络(dnn)通常有数千个可训练的参数以确保高准确性。由于大量的计算和内存需求，这些网络不适合实时和资源受限的系统。各种技术，如网络修剪、权值共享、网络量化和权值编码，都提高了计算和内存效率。同步权值量化压缩(SWQC)技术将网络量化与权值编码相结合，实现了网络量化过程中的权值压缩。该技术通过选择合适的组大小、再训练历元数和权值阈值，生成一个在准确率和压缩率之间取得良好平衡的量化神经网络模型。为了进一步提高SWQC的压缩率，本文提出了一种新的权重编码策略——不平衡编码。该策略能够将一个或多个量化权重压缩到一个比特，从而实现更高的压缩率。使用CIFAR10数据集在4位QNN上进行了实验。结果表明，对于具有大量参数的层，非平衡编码可以获得更高的压缩率。采用混合编码，将均衡编码和非均衡编码结合在不同的层中，可以获得比只使用其中一层更高的压缩率。在CIFAR10的实验中，不平衡编码在全连接层获得了超过13X的压缩率。通过比较，加入不平衡编码的SWQC比只使用平衡编码的SWQC压缩率提高了5倍以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unbalanced Encoding in Synchronous Weight Quantization-Compression for Low-Bit Quantized Neural Network

Deep neural networks (DNNs) usually have thousands of trainable parameters to ensure high accuracy. Due to large amounts of computation and memory requirements, these networks are not suitable for real-time and resource-constrained systems. Various techniques such as network pruning, weight sharing, network quantization, and weight encoding have improved computational and memory efficiency. The synchronous weight quantization-compression (SWQC) technique applies both network quantization and weight encoding to realize weight compression in the process of network quantization. This technique generates a quantized neural network (QNN) model with a good trade-off between accuracy and compression rate by choosing the proper group size, retraining epoch number, and weight threshold. To further improve the compression rate of SWQC, a new strategy for weight encoding, unbalanced encoding, is proposed in this paper. This strategy is able to compress one or multiple quantized weights into one bit, thereby achieving a higher compression rate. Experiments are performed on a 4-bit QNN using the CIFAR10 dataset. The results show that unbalanced encoding achieves a higher compression rate for the layers with large-quantity parameters. By using mixed encoding which combines balanced and unbalanced encoding in different layers can achieve a higher compression rate than using one of them only. In the experiments with CIFAR10, unbalanced encoding gets the compression rate of over 13X in the fully connected layer. By comparison, the compression rate of SWQC with the incorporation of unbalanced encoding achieves more than 5X higher than using balanced encoding only.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)

自引率

0.00%

发文量