基于FPGA的卷积神经网络加速方法研究

Tan Xiao, Man Tao
{"title":"基于FPGA的卷积神经网络加速方法研究","authors":"Tan Xiao, Man Tao","doi":"10.1109/ICAICA52286.2021.9498022","DOIUrl":null,"url":null,"abstract":"In recent years, with the continuous breakthrough in the field of algorithms, the computational complexity of current target detection algorithms is getting higher and higher. In the forward inference stage, many practical applications often have low latency and strict power consumption restrictions. How to realize a low-power, low-cost and high-performance target detection platform has gradually attracted attention. Given the current mobile scene's requirements for high performance and low power consumption, hardware acceleration architecture suitable for different CNNs is designed by combining the working principle of CNN and the computing characteristics of FPGA. CNN’s basic operation unit is realized through high-level synthesis technology, including convolution operation unit, pool operation unit, activation function unit, etc. Optimization strategies such as pipeline, dynamic fixed-point quantization, and ping-pong caching are adopted to reduce the use of on-chip and off-chip memory access and storage resources. Finally, two convolutional neural networks with different structures, the LeNet-5 classification network and, the YOLOv2 detection network, are selected for functional verification and performance analysis. The experimental results show that the convolutional neural network FPGA accelerator designed in this paper can provide better performance with fewer resources and power consumption and can efficiently use the hardware resources on the FPGA.","PeriodicalId":121979,"journal":{"name":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Research on FPGA Based Convolutional Neural Network Acceleration Method\",\"authors\":\"Tan Xiao, Man Tao\",\"doi\":\"10.1109/ICAICA52286.2021.9498022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, with the continuous breakthrough in the field of algorithms, the computational complexity of current target detection algorithms is getting higher and higher. In the forward inference stage, many practical applications often have low latency and strict power consumption restrictions. How to realize a low-power, low-cost and high-performance target detection platform has gradually attracted attention. Given the current mobile scene's requirements for high performance and low power consumption, hardware acceleration architecture suitable for different CNNs is designed by combining the working principle of CNN and the computing characteristics of FPGA. CNN’s basic operation unit is realized through high-level synthesis technology, including convolution operation unit, pool operation unit, activation function unit, etc. Optimization strategies such as pipeline, dynamic fixed-point quantization, and ping-pong caching are adopted to reduce the use of on-chip and off-chip memory access and storage resources. Finally, two convolutional neural networks with different structures, the LeNet-5 classification network and, the YOLOv2 detection network, are selected for functional verification and performance analysis. The experimental results show that the convolutional neural network FPGA accelerator designed in this paper can provide better performance with fewer resources and power consumption and can efficiently use the hardware resources on the FPGA.\",\"PeriodicalId\":121979,\"journal\":{\"name\":\"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"volume\":\"105 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICA52286.2021.9498022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA52286.2021.9498022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

近年来,随着算法领域的不断突破,当前目标检测算法的计算复杂度越来越高。在前向推理阶段,许多实际应用往往具有低延迟和严格的功耗限制。如何实现低功耗、低成本、高性能的目标检测平台逐渐受到人们的关注。针对当前移动场景对高性能、低功耗的要求,结合CNN的工作原理和FPGA的计算特点,设计了适合不同CNN的硬件加速架构。CNN的基本运算单元是通过高级合成技术实现的,包括卷积运算单元、池运算单元、激活函数单元等。采用流水线、动态定点量化、乒乓缓存等优化策略,减少片内和片外内存访问和存储资源的使用。最后,选择LeNet-5分类网络和YOLOv2检测网络这两个不同结构的卷积神经网络进行功能验证和性能分析。实验结果表明,本文设计的卷积神经网络FPGA加速器能够以更少的资源和功耗提供更好的性能,并且能够有效地利用FPGA上的硬件资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Research on FPGA Based Convolutional Neural Network Acceleration Method
In recent years, with the continuous breakthrough in the field of algorithms, the computational complexity of current target detection algorithms is getting higher and higher. In the forward inference stage, many practical applications often have low latency and strict power consumption restrictions. How to realize a low-power, low-cost and high-performance target detection platform has gradually attracted attention. Given the current mobile scene's requirements for high performance and low power consumption, hardware acceleration architecture suitable for different CNNs is designed by combining the working principle of CNN and the computing characteristics of FPGA. CNN’s basic operation unit is realized through high-level synthesis technology, including convolution operation unit, pool operation unit, activation function unit, etc. Optimization strategies such as pipeline, dynamic fixed-point quantization, and ping-pong caching are adopted to reduce the use of on-chip and off-chip memory access and storage resources. Finally, two convolutional neural networks with different structures, the LeNet-5 classification network and, the YOLOv2 detection network, are selected for functional verification and performance analysis. The experimental results show that the convolutional neural network FPGA accelerator designed in this paper can provide better performance with fewer resources and power consumption and can efficiently use the hardware resources on the FPGA.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信