{"title":"A 109-GOPs/W FPGA-Based Vision Transformer Accelerator With Weight-Loop Dataflow Featuring Data Reusing and Resource Saving","authors":"Yueqi Zhang;Lichen Feng;Hongwei Shan;Zhangming Zhu","doi":"10.1109/TCSVT.2024.3439600","DOIUrl":null,"url":null,"abstract":"The Vision Transformer (ViT) models have demonstrated excellent performance in computer vision tasks, but a large amount of computation and memory access for massive matrix multiplications lead to degraded hardware performance compared to convolutional neural network (CNN). In this paper, we propose a ViT accelerator with a novel “Weight-Loop” dataflow and its computing unit, for efficient matrix multiplication computation. By data partitioning and rearrangement, the number of memory accesses and the number of registers are greatly reduced, and the adder trees are eliminated. A computation pipeline with the proposed dataflow scheduling method is constructed to maintain a high utilization rate through zero bubble switching. Moreover, a novel accurate dual INT8 multiply-accumulate (DI8MAC) method for DSP optimization is introduced to eliminate the additional correction circuits by weight encoding. Verified in the Xilinx XCZU9EG FPGA, the proposed ViT accelerator achieves the lowest inference latencies of 3.91 ms and 13.98 ms for ViT-S and ViT-B, respectively. The throughput of the accelerator can reach up to 2330.2 GOPs with an energy efficiency of 109 GOPs/W, showing a significant improvement compared to the state-of-the-art works.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13596-13610"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10623817/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The Vision Transformer (ViT) models have demonstrated excellent performance in computer vision tasks, but a large amount of computation and memory access for massive matrix multiplications lead to degraded hardware performance compared to convolutional neural network (CNN). In this paper, we propose a ViT accelerator with a novel “Weight-Loop” dataflow and its computing unit, for efficient matrix multiplication computation. By data partitioning and rearrangement, the number of memory accesses and the number of registers are greatly reduced, and the adder trees are eliminated. A computation pipeline with the proposed dataflow scheduling method is constructed to maintain a high utilization rate through zero bubble switching. Moreover, a novel accurate dual INT8 multiply-accumulate (DI8MAC) method for DSP optimization is introduced to eliminate the additional correction circuits by weight encoding. Verified in the Xilinx XCZU9EG FPGA, the proposed ViT accelerator achieves the lowest inference latencies of 3.91 ms and 13.98 ms for ViT-S and ViT-B, respectively. The throughput of the accelerator can reach up to 2330.2 GOPs with an energy efficiency of 109 GOPs/W, showing a significant improvement compared to the state-of-the-art works.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.