Towards accurate post-training quantization for reparameterized models

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2025-03-31 DOI:10.1007/s10489-025-06418-0

Luoming Zhang, Yefei He, Wen Fei, Zhenyu Lou, Weijia Wu, Yangwei Ying, Hong Zhou

{"title":"Towards accurate post-training quantization for reparameterized models","authors":"Luoming Zhang, Yefei He, Wen Fei, Zhenyu Lou, Weijia Wu, Yangwei Ying, Hong Zhou","doi":"10.1007/s10489-025-06418-0","DOIUrl":null,"url":null,"abstract":"<div><p>Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework consists of two core components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1% for 8-bit PTQ and 2% for 6-bit PTQ, showcasing its superior performance. The code is available at https://github.com/ilur98/DLMC-QUANT.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06418-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework consists of two core components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1% for 8-bit PTQ and 2% for 6-bit PTQ, showcasing its superior performance. The code is available at https://github.com/ilur98/DLMC-QUANT.

查看原文本刊更多论文

模型重新参数化是一种广为接受的技术，可在不影响性能的前提下提高推理速度。然而，当前的训练后量化（PTQ）方法在应用于重新参数化模型时，往往会导致准确度显著下降。这主要是由特定信道和特定样本的异常值造成的，这些异常值只出现在特定的样本和信道中，并影响量化参数的选择。为了解决这个问题，我们提出了 RepAPQ，这是一个新颖的框架，可以保持量化的重新参数化模型的准确性。与以往使用平均平方误差（MSE）作为衡量标准的框架不同，我们利用平均绝对误差（MAE）来减轻异常值对量化参数的影响。我们的框架由两个核心部分组成：量化保护重参数化和跨块校准。为实现有效校准，量化保护重参数化将多个分支与仿射层结合成一个卷积。在训练过程中，仿射层会加速收敛并放大卷积的输出，以更好地适应异常值样本。此外，跨块校准利用阶段输出的测量作为监督，以解决 MAE 带来的梯度问题，并增强层间与量化参数的相关性。综合实验证明了 RepAPQ 在各种模型和任务中的有效性。对于 8 位 PTQ 和 6 位 PTQ，我们的框架分别比以前的方法高出约 1%和 2%，显示了其卓越的性能。代码见 https://github.com/ilur98/DLMC-QUANT。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.