Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI:10.1109/FPL57034.2022.00027

Z. Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, M. Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

{"title":"Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization","authors":"Z. Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, M. Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang","doi":"10.1109/FPL57034.2022.00027","DOIUrl":null,"url":null,"abstract":"Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, this is the first FPGA-based ViT acceleration framework exploring model quantization. Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0.47% to 1.36% higher Top-l accuracy under the same bit-width. Compared with the 32-bit floating-point baseline FPGA accelerator, our accelerator achieves around 5.6x improvement on the frame rate (i.e., 56.8 FPS vs. 10.0 FPS) with 0.71% accuracy drop on ImageNet dataset for DeiT-base.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Vision transformers (ViTs) are emerging with significantly improved accuracy in computer vision tasks. However, their complex architecture and enormous computation/storage demand impose urgent needs for new hardware accelerator design methodology. This work proposes an FPGA-aware automatic ViT acceleration framework based on the proposed mixed-scheme quantization. To the best of our knowledge, this is the first FPGA-based ViT acceleration framework exploring model quantization. Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0.47% to 1.36% higher Top-l accuracy under the same bit-width. Compared with the 32-bit floating-point baseline FPGA accelerator, our accelerator achieves around 5.6x improvement on the frame rate (i.e., 56.8 FPS vs. 10.0 FPS) with 0.71% accuracy drop on ImageNet dataset for DeiT-base.

查看原文本刊更多论文

auto - viti - acc:一种基于fpga的混合方案量化视觉变压器自动加速框架

视觉变压器(ViTs)的出现大大提高了计算机视觉任务的精度。然而，它们复杂的体系结构和巨大的计算/存储需求迫切需要新的硬件加速器设计方法。本文提出了一种基于混合方案量化的fpga感知ViT自动加速框架。据我们所知，这是第一个基于fpga的ViT加速框架探索模型量化。与最先进的ViT量化工作(仅采用算法方法而不使用硬件加速)相比，在相同比特宽度下，我们的量化实现了0.47%至1.36%的top - 1精度提高。与32位浮点基准FPGA加速器相比，我们的加速器在帧速率上实现了约5.6倍的改进(即56.8 FPS vs. 10.0 FPS)，在ImageNet数据集上的DeiT-base精度下降了0.71%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)

自引率

0.00%

发文量