Jiaqi Zhao,Chao Zeng,Ming Wang,Linxuan Han,Yuzhang Shang,Miao Zhang,Liqiang Nie
{"title":"LRQuant: A Unified and Learnable Framework to Post-training Quantization for Transformer-based Large Foundation Models.","authors":"Jiaqi Zhao,Chao Zeng,Ming Wang,Linxuan Han,Yuzhang Shang,Miao Zhang,Liqiang Nie","doi":"10.1109/tpami.2025.3599479","DOIUrl":null,"url":null,"abstract":"Post-training quantization (PTQ) for transformer-based large foundation models (LFMs) significantly accelerates model inference and relieves memory constraints, without incurring model training. However, existing methods face three main issues: 1) The scaling factors, which are commonly used in scale reparameterization based weight-activation quantization for mitigating the quantization errors, are mostly hand-crafted defined which may lead to suboptimal results; 2) The formulation of current quantization error defined by L2-norm ignores the directional shifts after quantization; 3) Most methods are devised tailored for single scenario, i.e., only evaluated on LLMs or only designed for weight-only quantization, which lacks of a comprehensive evaluation on diverse benchmarks and a broad application scope. To address these challenges, this paper introduces a unified Learnable and Robust post-training Quantization framework for transformer based LFMs and various quantization scenarios, called LRQuant. Firstly, we consider an efficient block-wise learnable paradigm to find optimal scaling factors which are initialized by logarithmic activation equivalent and get suitable clipping range of quantization steps. In addition, we empirically find that only relying on MSE loss could hardly lead to optimal quantization results, so we reformulate the quantization error and then propose a novel loss function based on the negative logarithm of cosine similarity (NLC loss) between outputs of full-precision and quantized block. To fully investigate the potentiality of our learnable paradigm, we propose a more superior version LRQuant+. Specifically, we first propose a dynamically weighted scheme to balance MSE and NLC loss, and then devise learnable rotation vectors to further directly reduce directional gaps. In addition, we improve the block-wise optimization framework into a novel two-branch nature which jointly considers the error propagation and homologous reconstruction error. Extensive experiments demonstrate the superiority of our LRQuant and LRQuant+, as well as their unified effectiveness across various LFMs for both weight-activation and weight-only quantization, especially under challenging quantization scenarios, i.e., W4A4 and W2A16 on LLMs, ViTS, and MLLMs. Codes are available at https://github.com/zjq0455/LRQuant.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"39 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3599479","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Post-training quantization (PTQ) for transformer-based large foundation models (LFMs) significantly accelerates model inference and relieves memory constraints, without incurring model training. However, existing methods face three main issues: 1) The scaling factors, which are commonly used in scale reparameterization based weight-activation quantization for mitigating the quantization errors, are mostly hand-crafted defined which may lead to suboptimal results; 2) The formulation of current quantization error defined by L2-norm ignores the directional shifts after quantization; 3) Most methods are devised tailored for single scenario, i.e., only evaluated on LLMs or only designed for weight-only quantization, which lacks of a comprehensive evaluation on diverse benchmarks and a broad application scope. To address these challenges, this paper introduces a unified Learnable and Robust post-training Quantization framework for transformer based LFMs and various quantization scenarios, called LRQuant. Firstly, we consider an efficient block-wise learnable paradigm to find optimal scaling factors which are initialized by logarithmic activation equivalent and get suitable clipping range of quantization steps. In addition, we empirically find that only relying on MSE loss could hardly lead to optimal quantization results, so we reformulate the quantization error and then propose a novel loss function based on the negative logarithm of cosine similarity (NLC loss) between outputs of full-precision and quantized block. To fully investigate the potentiality of our learnable paradigm, we propose a more superior version LRQuant+. Specifically, we first propose a dynamically weighted scheme to balance MSE and NLC loss, and then devise learnable rotation vectors to further directly reduce directional gaps. In addition, we improve the block-wise optimization framework into a novel two-branch nature which jointly considers the error propagation and homologous reconstruction error. Extensive experiments demonstrate the superiority of our LRQuant and LRQuant+, as well as their unified effectiveness across various LFMs for both weight-activation and weight-only quantization, especially under challenging quantization scenarios, i.e., W4A4 and W2A16 on LLMs, ViTS, and MLLMs. Codes are available at https://github.com/zjq0455/LRQuant.
期刊介绍:
The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.