ORQ-ViT: Outlier resilient Post Training Quantization for vision transformers via outlier decomposition

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture Pub Date : 2025-08-27 DOI:10.1016/j.sysarc.2025.103530

Xinyu He , Ye Lu , Hang Liu , Cheng Gong , Wenxuan He

{"title":"ORQ-ViT: Outlier resilient Post Training Quantization for vision transformers via outlier decomposition","authors":"Xinyu He , Ye Lu , Hang Liu , Cheng Gong , Wenxuan He","doi":"10.1016/j.sysarc.2025.103530","DOIUrl":null,"url":null,"abstract":"<div><div>Post-training quantization (PTQ) is critical for deploying Vision Transformers (ViTs) on resource-constrained devices. However, outliers clustered in activation channels tend to dominate the quantization range and induce significant accuracy degradation. To address the outlier challenge, this paper proposes an outlier resilient PTQ method through outlier decomposition, namely ORQ-ViT. Its core idea is to decompose outliers clustered in outlier channels into isolated outliers, so that they can be easily excluded, thus alleviating their adverse impact. Specifically, we decompose activations along the patch token dimension. Since there are very few outlier channels, decomposed rows after outlier decomposition usually cover several isolated outliers, which can be easily identified and filtered. We further design an adaptive quantization range determination strategy during quantization parameters initialization to prevent outliers from serving as boundary values of the quantization range. ORQ-ViT can improve quantization levels utilization to generate activations with higher quantization resolution, thereby achieving higher accuracy. Additionally, ORQ-ViT supports pure integer matrix multiplications to ensure the inference efficiency of quantized ViTs on edge hardware. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) accuracy across various ViT variants under multiple low bit-width scenarios. For image classification, the top-1 accuracy of ORQ-ViT outperforms that of the SOTA methods by an average of 2.26% at W4A4. Even for object detection and instance segmentation, ORQ-ViT also delivers highly competitive results. We also evaluate the inference efficiency of pure integer matrix multiplications and the results show that our method can achieve up to 2.1× speedup.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"168 ","pages":"Article 103530"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125002024","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Post-training quantization (PTQ) is critical for deploying Vision Transformers (ViTs) on resource-constrained devices. However, outliers clustered in activation channels tend to dominate the quantization range and induce significant accuracy degradation. To address the outlier challenge, this paper proposes an outlier resilient PTQ method through outlier decomposition, namely ORQ-ViT. Its core idea is to decompose outliers clustered in outlier channels into isolated outliers, so that they can be easily excluded, thus alleviating their adverse impact. Specifically, we decompose activations along the patch token dimension. Since there are very few outlier channels, decomposed rows after outlier decomposition usually cover several isolated outliers, which can be easily identified and filtered. We further design an adaptive quantization range determination strategy during quantization parameters initialization to prevent outliers from serving as boundary values of the quantization range. ORQ-ViT can improve quantization levels utilization to generate activations with higher quantization resolution, thereby achieving higher accuracy. Additionally, ORQ-ViT supports pure integer matrix multiplications to ensure the inference efficiency of quantized ViTs on edge hardware. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) accuracy across various ViT variants under multiple low bit-width scenarios. For image classification, the top-1 accuracy of ORQ-ViT outperforms that of the SOTA methods by an average of 2.26% at W4A4. Even for object detection and instance segmentation, ORQ-ViT also delivers highly competitive results. We also evaluate the inference efficiency of pure integer matrix multiplications and the results show that our method can achieve up to 2.1× speedup.

查看原文本刊更多论文

ORQ-ViT：基于离群值分解的视觉变压器训练后离群值弹性量化

训练后量化（PTQ）对于在资源受限的设备上部署视觉变压器（vit）至关重要。然而，聚集在激活通道中的异常值往往会主导量化范围，并导致显著的精度下降。为了解决离群值的挑战，本文提出了一种通过离群值分解的离群值弹性PTQ方法，即ORQ-ViT。其核心思想是将聚集在离群通道中的离群值分解为孤立的离群值，使其易于被排除，从而减轻其不利影响。具体来说，我们沿着补丁令牌维度分解激活。由于离群点通道很少，因此离群点分解后的分解行通常包含几个孤立的离群点，这些离群点易于识别和过滤。在量化参数初始化过程中，设计了自适应量化范围确定策略，防止异常值作为量化范围的边界值。ORQ-ViT可以提高量化水平利用率，生成具有更高量化分辨率的激活，从而实现更高的精度。此外，ORQ-ViT支持纯整数矩阵乘法，以确保量化vit在边缘硬件上的推理效率。大量的实验表明，我们的方法在多种低位宽场景下的各种ViT变体中达到了最先进的（SOTA）精度。对于图像分类，在W4A4下，ORQ-ViT的top-1精度比SOTA方法平均高出2.26%。即使在对象检测和实例分割方面，ORQ-ViT也提供了极具竞争力的结果。我们还评估了纯整数矩阵乘法的推理效率，结果表明我们的方法可以达到2.1倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.