An OpenCL-Based FPGA Accelerator for Compressed YOLOv2

Anrong Yang, Yuanhui Li, Hongqiao Shu, Jianlin Deng, Chuanzhao Ma, Zheng Li, Qigang Wang
{"title":"An OpenCL-Based FPGA Accelerator for Compressed YOLOv2","authors":"Anrong Yang, Yuanhui Li, Hongqiao Shu, Jianlin Deng, Chuanzhao Ma, Zheng Li, Qigang Wang","doi":"10.1109/ICFPT47387.2019.00036","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are widely used in computer vision applications. GPU has been the mainstream accelerator for CNNs. Compared with GPU, FPGA has the advantages of high flexibility, low power consumption and abundant DSP resources, which make it possible to surpass GPU in some scenarios. The recent progress of high level synthesis tools greatly improves the development efficiency of FPGA. In this paper, an OpenCL-based CNN accelerator is designed for FPGA and a variety of model compression techniques are applied to the YOLOv2 model. The accelerator uses the Winograd algorithm to implement convolution efficiently and solves the unaligned global memory access issue caused by the Winograd algorithm with an alignment stream buffer. This design makes full use of the available memory access bandwidth and utilizes all the available DSP resources. Parallelism is exploited in various dimensions for optimal performance. The performance of our FPGA design can reach 10 ms per image in terms of latency, compared to 15 ms per image with an nVidia P100 GPU. We plan to make our design open source so that the community can benefit from it and contribute to it together.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"332 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Convolutional neural networks (CNNs) are widely used in computer vision applications. GPU has been the mainstream accelerator for CNNs. Compared with GPU, FPGA has the advantages of high flexibility, low power consumption and abundant DSP resources, which make it possible to surpass GPU in some scenarios. The recent progress of high level synthesis tools greatly improves the development efficiency of FPGA. In this paper, an OpenCL-based CNN accelerator is designed for FPGA and a variety of model compression techniques are applied to the YOLOv2 model. The accelerator uses the Winograd algorithm to implement convolution efficiently and solves the unaligned global memory access issue caused by the Winograd algorithm with an alignment stream buffer. This design makes full use of the available memory access bandwidth and utilizes all the available DSP resources. Parallelism is exploited in various dimensions for optimal performance. The performance of our FPGA design can reach 10 ms per image in terms of latency, compared to 15 ms per image with an nVidia P100 GPU. We plan to make our design open source so that the community can benefit from it and contribute to it together.
基于opencl的压缩YOLOv2 FPGA加速器
卷积神经网络(cnn)在计算机视觉领域有着广泛的应用。GPU一直是cnn的主流加速器。与GPU相比,FPGA具有灵活性高、功耗低、DSP资源丰富等优势,使其在某些场景下超越GPU成为可能。近年来高级合成工具的发展大大提高了FPGA的开发效率。本文针对FPGA设计了基于opencl的CNN加速器,并将多种模型压缩技术应用于YOLOv2模型。加速器采用Winograd算法高效实现卷积,并利用对齐流缓冲解决了Winograd算法导致的全局内存访问不对齐问题。本设计充分利用了可用的存储器访问带宽,充分利用了所有可用的DSP资源。为了获得最佳性能,在各个维度上都利用了并行性。我们的FPGA设计性能可以达到每幅图像10毫秒的延迟,而nVidia P100 GPU的延迟为每幅图像15毫秒。我们计划将我们的设计开源,这样社区就可以从中受益,并共同为之做出贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信