基于Vitis-AI的目标检测加速器的FPGA实现

2021 11th International Conference on Information Science and Technology (ICIST) Pub Date : 2021-05-21 DOI:10.1109/ICIST52614.2021.9440554

Jin Wang, Shenshen Gu

{"title":"基于Vitis-AI的目标检测加速器的FPGA实现","authors":"Jin Wang, Shenshen Gu","doi":"10.1109/ICIST52614.2021.9440554","DOIUrl":null,"url":null,"abstract":"The emergence of YOLOv3 makes it possible to detect small targets. Due to the characteristics of the YOLO network itself, the YOLOv3 network has exceptionally high requirements for computing power and memory bandwidth and it usually needs to be deployed on a dedicated hardware acceleration platform. FPGAs is a logically reconfigurable hardware chip with substantial advantages in terms of performance and power consumption, so it is a good choice to deploy a deep convolutional network. In the research of this paper, we proposed a reconfigurable YOLOv3 FPGA hardware accelerator based on the AXI bus ARM+FPGA architecture. The YOLOv3 network quantifies through Vitis AI, and a series of operations such as model compression and data pre-processing can save accelerator chips and the access time of external storage. Pipeline operation enables FPGAs to achieve higher throughput. Compared with the GPU implementation of the YOLOv3 model, it is found that the hardware implementation of the FPGA-based YOLOv3 accelerator has lower energy consumption and can achieve higher throughput.","PeriodicalId":371599,"journal":{"name":"2021 11th International Conference on Information Science and Technology (ICIST)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"FPGA Implementation of Object Detection Accelerator Based on Vitis-AI\",\"authors\":\"Jin Wang, Shenshen Gu\",\"doi\":\"10.1109/ICIST52614.2021.9440554\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emergence of YOLOv3 makes it possible to detect small targets. Due to the characteristics of the YOLO network itself, the YOLOv3 network has exceptionally high requirements for computing power and memory bandwidth and it usually needs to be deployed on a dedicated hardware acceleration platform. FPGAs is a logically reconfigurable hardware chip with substantial advantages in terms of performance and power consumption, so it is a good choice to deploy a deep convolutional network. In the research of this paper, we proposed a reconfigurable YOLOv3 FPGA hardware accelerator based on the AXI bus ARM+FPGA architecture. The YOLOv3 network quantifies through Vitis AI, and a series of operations such as model compression and data pre-processing can save accelerator chips and the access time of external storage. Pipeline operation enables FPGAs to achieve higher throughput. Compared with the GPU implementation of the YOLOv3 model, it is found that the hardware implementation of the FPGA-based YOLOv3 accelerator has lower energy consumption and can achieve higher throughput.\",\"PeriodicalId\":371599,\"journal\":{\"name\":\"2021 11th International Conference on Information Science and Technology (ICIST)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 11th International Conference on Information Science and Technology (ICIST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIST52614.2021.9440554\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 11th International Conference on Information Science and Technology (ICIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIST52614.2021.9440554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

摘要

YOLOv3的出现使得探测小目标成为可能。由于YOLO网络本身的特点，YOLOv3网络对计算能力和内存带宽的要求非常高，通常需要部署在专用的硬件加速平台上。fpga是一种逻辑上可重构的硬件芯片，在性能和功耗方面具有很大的优势，因此是部署深度卷积网络的一个很好的选择。在本文的研究中，我们提出了一种基于AXI总线ARM+FPGA架构的可重构YOLOv3 FPGA硬件加速器。YOLOv3网络通过Vitis AI进行量化，模型压缩、数据预处理等一系列操作可以节省加速器芯片和外部存储的访问时间。流水线操作使fpga实现更高的吞吐量。对比YOLOv3模型的GPU实现，发现基于fpga的YOLOv3加速器的硬件实现具有更低的能耗和更高的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FPGA Implementation of Object Detection Accelerator Based on Vitis-AI

The emergence of YOLOv3 makes it possible to detect small targets. Due to the characteristics of the YOLO network itself, the YOLOv3 network has exceptionally high requirements for computing power and memory bandwidth and it usually needs to be deployed on a dedicated hardware acceleration platform. FPGAs is a logically reconfigurable hardware chip with substantial advantages in terms of performance and power consumption, so it is a good choice to deploy a deep convolutional network. In the research of this paper, we proposed a reconfigurable YOLOv3 FPGA hardware accelerator based on the AXI bus ARM+FPGA architecture. The YOLOv3 network quantifies through Vitis AI, and a series of operations such as model compression and data pre-processing can save accelerator chips and the access time of external storage. Pipeline operation enables FPGAs to achieve higher throughput. Compared with the GPU implementation of the YOLOv3 model, it is found that the hardware implementation of the FPGA-based YOLOv3 accelerator has lower energy consumption and can achieve higher throughput.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 11th International Conference on Information Science and Technology (ICIST)

自引率

0.00%

发文量