Xinyu He , Ye Lu , Hang Liu , Cheng Gong , Wenxuan He
{"title":"ORQ-ViT: Outlier resilient Post Training Quantization for vision transformers via outlier decomposition","authors":"Xinyu He , Ye Lu , Hang Liu , Cheng Gong , Wenxuan He","doi":"10.1016/j.sysarc.2025.103530","DOIUrl":null,"url":null,"abstract":"<div><div>Post-training quantization (PTQ) is critical for deploying Vision Transformers (ViTs) on resource-constrained devices. However, outliers clustered in activation channels tend to dominate the quantization range and induce significant accuracy degradation. To address the outlier challenge, this paper proposes an outlier resilient PTQ method through outlier decomposition, namely ORQ-ViT. Its core idea is to decompose outliers clustered in outlier channels into isolated outliers, so that they can be easily excluded, thus alleviating their adverse impact. Specifically, we decompose activations along the patch token dimension. Since there are very few outlier channels, decomposed rows after outlier decomposition usually cover several isolated outliers, which can be easily identified and filtered. We further design an adaptive quantization range determination strategy during quantization parameters initialization to prevent outliers from serving as boundary values of the quantization range. ORQ-ViT can improve quantization levels utilization to generate activations with higher quantization resolution, thereby achieving higher accuracy. Additionally, ORQ-ViT supports pure integer matrix multiplications to ensure the inference efficiency of quantized ViTs on edge hardware. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) accuracy across various ViT variants under multiple low bit-width scenarios. For image classification, the top-1 accuracy of ORQ-ViT outperforms that of the SOTA methods by an average of 2.26% at W4A4. Even for object detection and instance segmentation, ORQ-ViT also delivers highly competitive results. We also evaluate the inference efficiency of pure integer matrix multiplications and the results show that our method can achieve up to 2.1× speedup.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"168 ","pages":"Article 103530"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125002024","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Post-training quantization (PTQ) is critical for deploying Vision Transformers (ViTs) on resource-constrained devices. However, outliers clustered in activation channels tend to dominate the quantization range and induce significant accuracy degradation. To address the outlier challenge, this paper proposes an outlier resilient PTQ method through outlier decomposition, namely ORQ-ViT. Its core idea is to decompose outliers clustered in outlier channels into isolated outliers, so that they can be easily excluded, thus alleviating their adverse impact. Specifically, we decompose activations along the patch token dimension. Since there are very few outlier channels, decomposed rows after outlier decomposition usually cover several isolated outliers, which can be easily identified and filtered. We further design an adaptive quantization range determination strategy during quantization parameters initialization to prevent outliers from serving as boundary values of the quantization range. ORQ-ViT can improve quantization levels utilization to generate activations with higher quantization resolution, thereby achieving higher accuracy. Additionally, ORQ-ViT supports pure integer matrix multiplications to ensure the inference efficiency of quantized ViTs on edge hardware. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) accuracy across various ViT variants under multiple low bit-width scenarios. For image classification, the top-1 accuracy of ORQ-ViT outperforms that of the SOTA methods by an average of 2.26% at W4A4. Even for object detection and instance segmentation, ORQ-ViT also delivers highly competitive results. We also evaluate the inference efficiency of pure integer matrix multiplications and the results show that our method can achieve up to 2.1× speedup.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.