An OpenCL-Based FPGA Accelerator for Compressed YOLOv2

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI:10.1109/ICFPT47387.2019.00036

Anrong Yang, Yuanhui Li, Hongqiao Shu, Jianlin Deng, Chuanzhao Ma, Zheng Li, Qigang Wang

{"title":"An OpenCL-Based FPGA Accelerator for Compressed YOLOv2","authors":"Anrong Yang, Yuanhui Li, Hongqiao Shu, Jianlin Deng, Chuanzhao Ma, Zheng Li, Qigang Wang","doi":"10.1109/ICFPT47387.2019.00036","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are widely used in computer vision applications. GPU has been the mainstream accelerator for CNNs. Compared with GPU, FPGA has the advantages of high flexibility, low power consumption and abundant DSP resources, which make it possible to surpass GPU in some scenarios. The recent progress of high level synthesis tools greatly improves the development efficiency of FPGA. In this paper, an OpenCL-based CNN accelerator is designed for FPGA and a variety of model compression techniques are applied to the YOLOv2 model. The accelerator uses the Winograd algorithm to implement convolution efficiently and solves the unaligned global memory access issue caused by the Winograd algorithm with an alignment stream buffer. This design makes full use of the available memory access bandwidth and utilizes all the available DSP resources. Parallelism is exploited in various dimensions for optimal performance. The performance of our FPGA design can reach 10 ms per image in terms of latency, compared to 15 ms per image with an nVidia P100 GPU. We plan to make our design open source so that the community can benefit from it and contribute to it together.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"332 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Convolutional neural networks (CNNs) are widely used in computer vision applications. GPU has been the mainstream accelerator for CNNs. Compared with GPU, FPGA has the advantages of high flexibility, low power consumption and abundant DSP resources, which make it possible to surpass GPU in some scenarios. The recent progress of high level synthesis tools greatly improves the development efficiency of FPGA. In this paper, an OpenCL-based CNN accelerator is designed for FPGA and a variety of model compression techniques are applied to the YOLOv2 model. The accelerator uses the Winograd algorithm to implement convolution efficiently and solves the unaligned global memory access issue caused by the Winograd algorithm with an alignment stream buffer. This design makes full use of the available memory access bandwidth and utilizes all the available DSP resources. Parallelism is exploited in various dimensions for optimal performance. The performance of our FPGA design can reach 10 ms per image in terms of latency, compared to 15 ms per image with an nVidia P100 GPU. We plan to make our design open source so that the community can benefit from it and contribute to it together.

查看原文本刊更多论文

基于opencl的压缩YOLOv2 FPGA加速器

卷积神经网络(cnn)在计算机视觉领域有着广泛的应用。GPU一直是cnn的主流加速器。与GPU相比，FPGA具有灵活性高、功耗低、DSP资源丰富等优势，使其在某些场景下超越GPU成为可能。近年来高级合成工具的发展大大提高了FPGA的开发效率。本文针对FPGA设计了基于opencl的CNN加速器，并将多种模型压缩技术应用于YOLOv2模型。加速器采用Winograd算法高效实现卷积，并利用对齐流缓冲解决了Winograd算法导致的全局内存访问不对齐问题。本设计充分利用了可用的存储器访问带宽，充分利用了所有可用的DSP资源。为了获得最佳性能，在各个维度上都利用了并行性。我们的FPGA设计性能可以达到每幅图像10毫秒的延迟，而nVidia P100 GPU的延迟为每幅图像15毫秒。我们计划将我们的设计开源，这样社区就可以从中受益，并共同为之做出贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量