IF 3.4 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Luoming Zhang, Yefei He, Wen Fei, Zhenyu Lou, Weijia Wu, Yangwei Ying, Hong Zhou
{"title":"Towards accurate post-training quantization for reparameterized models","authors":"Luoming Zhang,&nbsp;Yefei He,&nbsp;Wen Fei,&nbsp;Zhenyu Lou,&nbsp;Weijia Wu,&nbsp;Yangwei Ying,&nbsp;Hong Zhou","doi":"10.1007/s10489-025-06418-0","DOIUrl":null,"url":null,"abstract":"<div><p>Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework consists of two core components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1% for 8-bit PTQ and 2% for 6-bit PTQ, showcasing its superior performance. The code is available at https://github.com/ilur98/DLMC-QUANT.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 7","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06418-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

模型重新参数化是一种广为接受的技术,可在不影响性能的前提下提高推理速度。然而,当前的训练后量化(PTQ)方法在应用于重新参数化模型时,往往会导致准确度显著下降。这主要是由特定信道和特定样本的异常值造成的,这些异常值只出现在特定的样本和信道中,并影响量化参数的选择。为了解决这个问题,我们提出了 RepAPQ,这是一个新颖的框架,可以保持量化的重新参数化模型的准确性。与以往使用平均平方误差(MSE)作为衡量标准的框架不同,我们利用平均绝对误差(MAE)来减轻异常值对量化参数的影响。我们的框架由两个核心部分组成:量化保护重参数化和跨块校准。为实现有效校准,量化保护重参数化将多个分支与仿射层结合成一个卷积。在训练过程中,仿射层会加速收敛并放大卷积的输出,以更好地适应异常值样本。此外,跨块校准利用阶段输出的测量作为监督,以解决 MAE 带来的梯度问题,并增强层间与量化参数的相关性。综合实验证明了 RepAPQ 在各种模型和任务中的有效性。对于 8 位 PTQ 和 6 位 PTQ,我们的框架分别比以前的方法高出约 1%和 2%,显示了其卓越的性能。代码见 https://github.com/ilur98/DLMC-QUANT。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards accurate post-training quantization for reparameterized models

Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework consists of two core components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1% for 8-bit PTQ and 2% for 6-bit PTQ, showcasing its superior performance. The code is available at https://github.com/ilur98/DLMC-QUANT.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信