{"title":"PSAQ-ViT V2:为视觉变换器实现精确和通用的无数据量化","authors":"Zhikai Li, Mengjuan Chen, Junrui Xiao, Qingyi Gu","doi":"10.1109/TNNLS.2023.3301007","DOIUrl":null,"url":null,"abstract":"<p><p>Data-free quantization can potentially address data privacy and security concerns in model compression and thus has been widely investigated. Recently, patch similarity aware data-free quantization for vision transformers (PSAQ-ViT) designs a relative value metric, patch similarity, to generate data from pretrained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this article, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model in a competitive and interactive fashion under the supervision of the full-precision (FP) model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task-and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mean Intersection over Union (mIoU) on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code is released and merged at: https://github.com/zkkli/PSAQ-ViT.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers.\",\"authors\":\"Zhikai Li, Mengjuan Chen, Junrui Xiao, Qingyi Gu\",\"doi\":\"10.1109/TNNLS.2023.3301007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Data-free quantization can potentially address data privacy and security concerns in model compression and thus has been widely investigated. Recently, patch similarity aware data-free quantization for vision transformers (PSAQ-ViT) designs a relative value metric, patch similarity, to generate data from pretrained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this article, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model in a competitive and interactive fashion under the supervision of the full-precision (FP) model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task-and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mean Intersection over Union (mIoU) on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code is released and merged at: https://github.com/zkkli/PSAQ-ViT.</p>\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":10.2000,\"publicationDate\":\"2023-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/TNNLS.2023.3301007\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TNNLS.2023.3301007","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
无数据量化有可能解决模型压缩中的数据隐私和安全问题,因此被广泛研究。最近,用于视觉变换器的补丁相似性感知无数据量化(PSAQ-ViT)设计了一种相对值度量--补丁相似性,用于从预训练的视觉变换器(ViT)中生成数据,实现了视觉变换器无数据量化的首次尝试。在本文中,我们在 PSAQ-ViT 的基础上提出了 PSAQ-ViT V2,这是一种更精确、更通用的无数据量化 ViT 框架。更具体地说,根据 PSAQ-ViT 中的补丁相似度度量,我们引入了自适应师生策略,在全精度(FP)模型(教师)的监督下,以竞争和互动的方式促进生成样本和量化模型的不断循环演化,从而显著提高量化模型的精度。此外,在没有辅助类别指导的情况下,我们采用了与任务和模型无关的先验信息,从而使通用方案与广泛的视觉任务和模型兼容。我们对图像分类、物体检测和语义分割任务中的各种模型进行了广泛的实验,PSAQ-ViT V2 在采用天真量化策略和不访问真实世界数据的情况下,始终取得了具有竞争力的结果,显示出作为无数据量化 ViT 的强大基准的潜力。例如,以 Swin-S 为(骨干)模型,8 位量化在 ImageNet 上达到了 82.13 的 top-1 准确率,在 COCO 上达到了 50.9 的 box AP 和 44.1 的 mask AP,在 ADE20K 上达到了 47.2 的 mean Intersection over Union (mIoU)。我们希望准确而通用的 PSAQ-ViT V2 能在涉及敏感数据的实际应用中成为一种潜在的实践解决方案。代码发布和合并地址:https://github.com/zkkli/PSAQ-ViT。
PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers.
Data-free quantization can potentially address data privacy and security concerns in model compression and thus has been widely investigated. Recently, patch similarity aware data-free quantization for vision transformers (PSAQ-ViT) designs a relative value metric, patch similarity, to generate data from pretrained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this article, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model in a competitive and interactive fashion under the supervision of the full-precision (FP) model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task-and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mean Intersection over Union (mIoU) on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code is released and merged at: https://github.com/zkkli/PSAQ-ViT.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.