{"title":"Companding quantization for deep neural networks","authors":"Alaa Osama , Ahmed Madian , Mohammed E. Fouda","doi":"10.1016/j.asoc.2025.113979","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning models typically require substantial computational resources and exhibit large model sizes to achieve high accuracy, presenting challenges for deployment on resource-constrained hardware, such as mobile and edge devices. To mitigate these challenges, various strategies have been proposed, including model compression techniques aimed at reducing the precision of weights and activations. Among these, quantization stands out as a prevalent method that significantly reduces model size; however, it often results in an accuracy loss. This work focuses on narrowing the accuracy gap between full-precision models and their low-bit quantized counterparts. A comprehensive investigation is presented into the application of <span><math><mi>μ</mi></math></span>-law companding for the quantization of diverse neural network architectures. The research explores the impact of optimizing the <span><math><mi>μ</mi></math></span> parameter within the companding framework used to quantize weights and activations. Specifically, three different configurations for the <span><math><mi>μ</mi></math></span> parameter are examined: a single <span><math><mi>μ</mi></math></span> for all weights and activations, separate <span><math><mi>μ</mi></math></span> values for weights and activations, and individual <span><math><mi>μ</mi></math></span> parameters for each layer’s weights and activations. A comprehensive analysis was conducted to compare two distinct arrangements of these composite blocks, focusing on their performance metrics and overall effectiveness. Extensive experiments were conducted utilizing benchmark models, including VGG19 and AlexNet, tested on the CIFAR-10 and CIFAR-100 datasets. The results illustrate significant enhancements in top-1 prediction accuracy when compared to traditional quantization methods, with some configurations surpassing the accuracy of the full-precision model by as much as 0.11 %. For a 4-bit quantized model applied to the CIFAR-10 dataset, the loss was maintained below 1 %.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113979"},"PeriodicalIF":6.6000,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156849462501292X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning models typically require substantial computational resources and exhibit large model sizes to achieve high accuracy, presenting challenges for deployment on resource-constrained hardware, such as mobile and edge devices. To mitigate these challenges, various strategies have been proposed, including model compression techniques aimed at reducing the precision of weights and activations. Among these, quantization stands out as a prevalent method that significantly reduces model size; however, it often results in an accuracy loss. This work focuses on narrowing the accuracy gap between full-precision models and their low-bit quantized counterparts. A comprehensive investigation is presented into the application of -law companding for the quantization of diverse neural network architectures. The research explores the impact of optimizing the parameter within the companding framework used to quantize weights and activations. Specifically, three different configurations for the parameter are examined: a single for all weights and activations, separate values for weights and activations, and individual parameters for each layer’s weights and activations. A comprehensive analysis was conducted to compare two distinct arrangements of these composite blocks, focusing on their performance metrics and overall effectiveness. Extensive experiments were conducted utilizing benchmark models, including VGG19 and AlexNet, tested on the CIFAR-10 and CIFAR-100 datasets. The results illustrate significant enhancements in top-1 prediction accuracy when compared to traditional quantization methods, with some configurations surpassing the accuracy of the full-precision model by as much as 0.11 %. For a 4-bit quantized model applied to the CIFAR-10 dataset, the loss was maintained below 1 %.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.