M2-ViT: Accelerating Hybrid Vision Transformers With Two-Level Mixed Quantization

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-01-15 DOI:10.1109/TVLSI.2024.3525184

Yanbiao Liang;Huihong Shi;Zhongfeng Wang

{"title":"M2-ViT: Accelerating Hybrid Vision Transformers With Two-Level Mixed Quantization","authors":"Yanbiao Liang;Huihong Shi;Zhongfeng Wang","doi":"10.1109/TVLSI.2024.3525184","DOIUrl":null,"url":null,"abstract":"Although vision transformers (ViTs) have achieved significant success, their intensive computations and substantial memory overheads challenge their deployment on edge devices. To address this, efficient ViTs have emerged, typically featuring convolution-transformer hybrid architectures to enhance both accuracy and hardware efficiency. While prior work has explored quantization for efficient ViTs to marry the hardware efficiency of efficient hybrid ViT architectures and quantization, it focuses on uniform quantization and overlooks the potential advantages of mixed quantization. Meanwhile, although several works have studied mixed quantization for standard ViTs, they are not directly applicable to hybrid ViTs due to their distinct algorithmic and hardware characteristics. To bridge this gap, we present M2-ViT to accelerate convolution-transformer hybrid efficient ViTs with two-level mixed quantization (M2Q). Specifically, we introduce a hardware-friendly M2Q strategy, characterized by both mixed quantization precision and mixed quantization schemes [uniform and power-of-two (PoT)], to exploit the architectural properties of efficient ViTs. We further build a dedicated accelerator with heterogeneous computing engines to transform algorithmic benefits into real hardware improvements. The experimental results validate our effectiveness, showcasing an average of 80% energy-delay product (EDP) saving with comparable quantization accuracy compared to the prior work. Codes are available at <uri>https://github.com/lybbill/M2ViT</uri>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1492-1496"},"PeriodicalIF":2.8000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10843138/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Although vision transformers (ViTs) have achieved significant success, their intensive computations and substantial memory overheads challenge their deployment on edge devices. To address this, efficient ViTs have emerged, typically featuring convolution-transformer hybrid architectures to enhance both accuracy and hardware efficiency. While prior work has explored quantization for efficient ViTs to marry the hardware efficiency of efficient hybrid ViT architectures and quantization, it focuses on uniform quantization and overlooks the potential advantages of mixed quantization. Meanwhile, although several works have studied mixed quantization for standard ViTs, they are not directly applicable to hybrid ViTs due to their distinct algorithmic and hardware characteristics. To bridge this gap, we present M2-ViT to accelerate convolution-transformer hybrid efficient ViTs with two-level mixed quantization (M2Q). Specifically, we introduce a hardware-friendly M2Q strategy, characterized by both mixed quantization precision and mixed quantization schemes [uniform and power-of-two (PoT)], to exploit the architectural properties of efficient ViTs. We further build a dedicated accelerator with heterogeneous computing engines to transform algorithmic benefits into real hardware improvements. The experimental results validate our effectiveness, showcasing an average of 80% energy-delay product (EDP) saving with comparable quantization accuracy compared to the prior work. Codes are available at https://github.com/lybbill/M2ViT.

查看原文本刊更多论文

M2-ViT：两级混合量化加速混合视觉变压器

尽管视觉变压器（vit）已经取得了显著的成功，但其密集的计算和大量的内存开销对其在边缘设备上的部署构成了挑战。为了解决这个问题，高效的vit出现了，通常采用卷积-变压器混合架构来提高精度和硬件效率。虽然先前的工作已经探索了高效ViT的量化，以将高效混合ViT架构的硬件效率与量化结合起来，但它侧重于均匀量化，而忽略了混合量化的潜在优势。同时，虽然已有多篇论文对标准vit的混合量化进行了研究，但由于混合vit的算法和硬件特点不同，不能直接适用于混合vit。为了弥补这一差距，我们提出了M2-ViT，通过两级混合量化（M2Q）来加速卷积-变压器混合高效vit。具体来说，我们引入了一种硬件友好的M2Q策略，其特点是混合量化精度和混合量化方案[均匀和2次方（PoT）]，以利用高效vit的体系结构特性。我们进一步构建了一个专用的加速器和异构计算引擎，将算法的优势转化为真正的硬件改进。实验结果验证了我们的有效性，与之前的工作相比，平均节省了80%的能量延迟积（EDP），量化精度相当。代码可在https://github.com/lybbill/M2ViT上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.