Companding quantization for deep neural networks

IF 6.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-10-11 DOI:10.1016/j.asoc.2025.113979

Alaa Osama , Ahmed Madian , Mohammed E. Fouda

{"title":"Companding quantization for deep neural networks","authors":"Alaa Osama , Ahmed Madian , Mohammed E. Fouda","doi":"10.1016/j.asoc.2025.113979","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning models typically require substantial computational resources and exhibit large model sizes to achieve high accuracy, presenting challenges for deployment on resource-constrained hardware, such as mobile and edge devices. To mitigate these challenges, various strategies have been proposed, including model compression techniques aimed at reducing the precision of weights and activations. Among these, quantization stands out as a prevalent method that significantly reduces model size; however, it often results in an accuracy loss. This work focuses on narrowing the accuracy gap between full-precision models and their low-bit quantized counterparts. A comprehensive investigation is presented into the application of <span><math><mi>μ</mi></math></span>-law companding for the quantization of diverse neural network architectures. The research explores the impact of optimizing the <span><math><mi>μ</mi></math></span> parameter within the companding framework used to quantize weights and activations. Specifically, three different configurations for the <span><math><mi>μ</mi></math></span> parameter are examined: a single <span><math><mi>μ</mi></math></span> for all weights and activations, separate <span><math><mi>μ</mi></math></span> values for weights and activations, and individual <span><math><mi>μ</mi></math></span> parameters for each layer’s weights and activations. A comprehensive analysis was conducted to compare two distinct arrangements of these composite blocks, focusing on their performance metrics and overall effectiveness. Extensive experiments were conducted utilizing benchmark models, including VGG19 and AlexNet, tested on the CIFAR-10 and CIFAR-100 datasets. The results illustrate significant enhancements in top-1 prediction accuracy when compared to traditional quantization methods, with some configurations surpassing the accuracy of the full-precision model by as much as 0.11 %. For a 4-bit quantized model applied to the CIFAR-10 dataset, the loss was maintained below 1 %.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113979"},"PeriodicalIF":6.6000,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156849462501292X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning models typically require substantial computational resources and exhibit large model sizes to achieve high accuracy, presenting challenges for deployment on resource-constrained hardware, such as mobile and edge devices. To mitigate these challenges, various strategies have been proposed, including model compression techniques aimed at reducing the precision of weights and activations. Among these, quantization stands out as a prevalent method that significantly reduces model size; however, it often results in an accuracy loss. This work focuses on narrowing the accuracy gap between full-precision models and their low-bit quantized counterparts. A comprehensive investigation is presented into the application of

μ

-law companding for the quantization of diverse neural network architectures. The research explores the impact of optimizing the

μ

parameter within the companding framework used to quantize weights and activations. Specifically, three different configurations for the

μ

parameter are examined: a single

μ

for all weights and activations, separate

μ

values for weights and activations, and individual

μ

parameters for each layer’s weights and activations. A comprehensive analysis was conducted to compare two distinct arrangements of these composite blocks, focusing on their performance metrics and overall effectiveness. Extensive experiments were conducted utilizing benchmark models, including VGG19 and AlexNet, tested on the CIFAR-10 and CIFAR-100 datasets. The results illustrate significant enhancements in top-1 prediction accuracy when compared to traditional quantization methods, with some configurations surpassing the accuracy of the full-precision model by as much as 0.11 %. For a 4-bit quantized model applied to the CIFAR-10 dataset, the loss was maintained below 1 %.

查看原文本刊更多论文

深度神经网络的扩展量化

深度学习模型通常需要大量的计算资源，并展示大的模型尺寸来实现高精度，这给在资源受限的硬件（如移动和边缘设备）上部署带来了挑战。为了缓解这些挑战，已经提出了各种策略，包括旨在降低权重和激活精度的模型压缩技术。其中，量化作为一种显著减小模型尺寸的流行方法脱颖而出；然而，它经常导致准确性的损失。这项工作的重点是缩小全精度模型和低比特量化模型之间的精度差距。对μ律扩展在不同神经网络结构量化中的应用进行了全面的研究。该研究探讨了在用于量化权重和激活的扩展框架内优化μ参数的影响。具体来说，我们检查了μ参数的三种不同配置：所有权重和激活都使用单个μ，权重和激活都使用单独的μ值，每层的权重和激活都使用单独的μ参数。我们进行了全面的分析，比较了这些复合区块的两种不同安排，重点关注它们的性能指标和总体有效性。利用基准模型（包括VGG19和AlexNet）在CIFAR-10和CIFAR-100数据集上进行了广泛的实验。结果表明，与传统量化方法相比，top-1预测精度显著提高，其中一些配置的精度超过全精度模型的精度高达0.11%。对于应用于CIFAR-10数据集的4位量化模型，损失保持在1%以下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.