ADFQ-ViT：视觉变压器的激活-分布友好的训练后量化

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-02-22 DOI:10.1016/j.neunet.2025.107289

Yanfeng Jiang , Ning Sun , Xueshuo Xie , Fei Yang , Tao Li

{"title":"ADFQ-ViT：视觉变压器的激活-分布友好的训练后量化","authors":"Yanfeng Jiang , Ning Sun , Xueshuo Xie , Fei Yang , Tao Li","doi":"10.1016/j.neunet.2025.107289","DOIUrl":null,"url":null,"abstract":"<div><div>Vision Transformers (ViTs) have exhibited exceptional performance across diverse computer vision tasks, while their substantial parameter size incurs significantly increased memory and computational demands, impeding effective inference on resource-constrained devices. Quantization has emerged as a promising solution to mitigate these challenges, yet existing methods still suffer from significant accuracy loss at low-bit. We attribute this issue to the distinctive distributions of post-LayerNorm and post-GELU activations within ViTs, rendering conventional hardware-friendly quantizers ineffective, particularly in low-bit scenarios. To address this issue, we propose a novel framework called Activation-Distribution-Friendly post-training Quantization for Vision Transformers, ADFQ-ViT. Concretely, we introduce the Per-Patch Outlier-aware Quantizer to tackle irregular outliers in post-LayerNorm activations. This quantizer refines the granularity of the uniform quantizer to a per-patch level while retaining a minimal subset of values exceeding a threshold at full-precision. To handle the non-uniform distributions of post-GELU activations between positive and negative regions, we design the Shift-Log2 Quantizer, which shifts all elements to the positive region and then applies log2 quantization. Moreover, we present the Attention-score enhanced Module-wise Optimization which adjusts the parameters of each quantizer by reconstructing errors to further mitigate quantization error. Extensive experiments demonstrate ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit. Specifically, when quantizing the ViT-B model to 4-bit, we achieve a 5.17% improvement in Top-1 accuracy on the ImageNet dataset. Our code is available at: <span><span>https://github.com/llwx593/adfq-vit.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"186 ","pages":"Article 107289"},"PeriodicalIF":6.3000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ADFQ-ViT: Activation-Distribution-Friendly post-training Quantization for Vision Transformers\",\"authors\":\"Yanfeng Jiang , Ning Sun , Xueshuo Xie , Fei Yang , Tao Li\",\"doi\":\"10.1016/j.neunet.2025.107289\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Vision Transformers (ViTs) have exhibited exceptional performance across diverse computer vision tasks, while their substantial parameter size incurs significantly increased memory and computational demands, impeding effective inference on resource-constrained devices. Quantization has emerged as a promising solution to mitigate these challenges, yet existing methods still suffer from significant accuracy loss at low-bit. We attribute this issue to the distinctive distributions of post-LayerNorm and post-GELU activations within ViTs, rendering conventional hardware-friendly quantizers ineffective, particularly in low-bit scenarios. To address this issue, we propose a novel framework called Activation-Distribution-Friendly post-training Quantization for Vision Transformers, ADFQ-ViT. Concretely, we introduce the Per-Patch Outlier-aware Quantizer to tackle irregular outliers in post-LayerNorm activations. This quantizer refines the granularity of the uniform quantizer to a per-patch level while retaining a minimal subset of values exceeding a threshold at full-precision. To handle the non-uniform distributions of post-GELU activations between positive and negative regions, we design the Shift-Log2 Quantizer, which shifts all elements to the positive region and then applies log2 quantization. Moreover, we present the Attention-score enhanced Module-wise Optimization which adjusts the parameters of each quantizer by reconstructing errors to further mitigate quantization error. Extensive experiments demonstrate ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit. Specifically, when quantizing the ViT-B model to 4-bit, we achieve a 5.17% improvement in Top-1 accuracy on the ImageNet dataset. Our code is available at: <span><span>https://github.com/llwx593/adfq-vit.git</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"186 \",\"pages\":\"Article 107289\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025001686\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025001686","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

视觉变压器（ViTs）在不同的计算机视觉任务中表现出优异的性能，而它们的大量参数尺寸会导致内存和计算需求的显著增加，阻碍了对资源受限设备的有效推断。量化已经成为缓解这些挑战的一种有希望的解决方案，但现有的方法仍然存在低比特的显着精度损失。我们将这一问题归因于vit中后layernorm和后gelu激活的独特分布，使得传统的硬件友好量化器无效，特别是在低位场景中。为了解决这个问题，我们提出了一个新的框架，称为激活-分布友好的视觉变压器训练后量化，ADFQ-ViT。具体来说，我们引入了逐patch异常值感知量化器来处理后layernorm激活中的不规则异常值。此量化器将均匀量化器的粒度细化到每个补丁级别，同时在全精度下保留超过阈值的最小子集。为了处理gelu后激活在正区域和负区域之间的不均匀分布，我们设计了Shift-Log2量化器，它将所有元素转移到正区域，然后应用log2量化。此外，我们还提出了注意力分数增强的模块式优化方法，该方法通过重构误差来调整每个量化器的参数，以进一步减轻量化误差。大量的实验表明，ADFQ-ViT在4位图像分类、目标检测和实例分割任务方面提供了显著的改进。具体来说，当将vitb模型量化为4位时，我们在ImageNet数据集上的Top-1精度提高了5.17%。我们的代码可在：https://github.com/llwx593/adfq-vit.git。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ADFQ-ViT: Activation-Distribution-Friendly post-training Quantization for Vision Transformers

Vision Transformers (ViTs) have exhibited exceptional performance across diverse computer vision tasks, while their substantial parameter size incurs significantly increased memory and computational demands, impeding effective inference on resource-constrained devices. Quantization has emerged as a promising solution to mitigate these challenges, yet existing methods still suffer from significant accuracy loss at low-bit. We attribute this issue to the distinctive distributions of post-LayerNorm and post-GELU activations within ViTs, rendering conventional hardware-friendly quantizers ineffective, particularly in low-bit scenarios. To address this issue, we propose a novel framework called Activation-Distribution-Friendly post-training Quantization for Vision Transformers, ADFQ-ViT. Concretely, we introduce the Per-Patch Outlier-aware Quantizer to tackle irregular outliers in post-LayerNorm activations. This quantizer refines the granularity of the uniform quantizer to a per-patch level while retaining a minimal subset of values exceeding a threshold at full-precision. To handle the non-uniform distributions of post-GELU activations between positive and negative regions, we design the Shift-Log2 Quantizer, which shifts all elements to the positive region and then applies log2 quantization. Moreover, we present the Attention-score enhanced Module-wise Optimization which adjusts the parameters of each quantizer by reconstructing errors to further mitigate quantization error. Extensive experiments demonstrate ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit. Specifically, when quantizing the ViT-B model to 4-bit, we achieve a 5.17% improvement in Top-1 accuracy on the ImageNet dataset. Our code is available at: https://github.com/llwx593/adfq-vit.git.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.