基于FPGA的卷积神经网络加速方法研究

2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA) Pub Date : 2021-06-28 DOI:10.1109/ICAICA52286.2021.9498022

Tan Xiao, Man Tao

{"title":"基于FPGA的卷积神经网络加速方法研究","authors":"Tan Xiao, Man Tao","doi":"10.1109/ICAICA52286.2021.9498022","DOIUrl":null,"url":null,"abstract":"In recent years, with the continuous breakthrough in the field of algorithms, the computational complexity of current target detection algorithms is getting higher and higher. In the forward inference stage, many practical applications often have low latency and strict power consumption restrictions. How to realize a low-power, low-cost and high-performance target detection platform has gradually attracted attention. Given the current mobile scene's requirements for high performance and low power consumption, hardware acceleration architecture suitable for different CNNs is designed by combining the working principle of CNN and the computing characteristics of FPGA. CNN’s basic operation unit is realized through high-level synthesis technology, including convolution operation unit, pool operation unit, activation function unit, etc. Optimization strategies such as pipeline, dynamic fixed-point quantization, and ping-pong caching are adopted to reduce the use of on-chip and off-chip memory access and storage resources. Finally, two convolutional neural networks with different structures, the LeNet-5 classification network and, the YOLOv2 detection network, are selected for functional verification and performance analysis. The experimental results show that the convolutional neural network FPGA accelerator designed in this paper can provide better performance with fewer resources and power consumption and can efficiently use the hardware resources on the FPGA.","PeriodicalId":121979,"journal":{"name":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Research on FPGA Based Convolutional Neural Network Acceleration Method\",\"authors\":\"Tan Xiao, Man Tao\",\"doi\":\"10.1109/ICAICA52286.2021.9498022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, with the continuous breakthrough in the field of algorithms, the computational complexity of current target detection algorithms is getting higher and higher. In the forward inference stage, many practical applications often have low latency and strict power consumption restrictions. How to realize a low-power, low-cost and high-performance target detection platform has gradually attracted attention. Given the current mobile scene's requirements for high performance and low power consumption, hardware acceleration architecture suitable for different CNNs is designed by combining the working principle of CNN and the computing characteristics of FPGA. CNN’s basic operation unit is realized through high-level synthesis technology, including convolution operation unit, pool operation unit, activation function unit, etc. Optimization strategies such as pipeline, dynamic fixed-point quantization, and ping-pong caching are adopted to reduce the use of on-chip and off-chip memory access and storage resources. Finally, two convolutional neural networks with different structures, the LeNet-5 classification network and, the YOLOv2 detection network, are selected for functional verification and performance analysis. The experimental results show that the convolutional neural network FPGA accelerator designed in this paper can provide better performance with fewer resources and power consumption and can efficiently use the hardware resources on the FPGA.\",\"PeriodicalId\":121979,\"journal\":{\"name\":\"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"volume\":\"105 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICA52286.2021.9498022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA52286.2021.9498022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

近年来，随着算法领域的不断突破，当前目标检测算法的计算复杂度越来越高。在前向推理阶段，许多实际应用往往具有低延迟和严格的功耗限制。如何实现低功耗、低成本、高性能的目标检测平台逐渐受到人们的关注。针对当前移动场景对高性能、低功耗的要求，结合CNN的工作原理和FPGA的计算特点，设计了适合不同CNN的硬件加速架构。CNN的基本运算单元是通过高级合成技术实现的，包括卷积运算单元、池运算单元、激活函数单元等。采用流水线、动态定点量化、乒乓缓存等优化策略，减少片内和片外内存访问和存储资源的使用。最后，选择LeNet-5分类网络和YOLOv2检测网络这两个不同结构的卷积神经网络进行功能验证和性能分析。实验结果表明，本文设计的卷积神经网络FPGA加速器能够以更少的资源和功耗提供更好的性能，并且能够有效地利用FPGA上的硬件资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on FPGA Based Convolutional Neural Network Acceleration Method

In recent years, with the continuous breakthrough in the field of algorithms, the computational complexity of current target detection algorithms is getting higher and higher. In the forward inference stage, many practical applications often have low latency and strict power consumption restrictions. How to realize a low-power, low-cost and high-performance target detection platform has gradually attracted attention. Given the current mobile scene's requirements for high performance and low power consumption, hardware acceleration architecture suitable for different CNNs is designed by combining the working principle of CNN and the computing characteristics of FPGA. CNN’s basic operation unit is realized through high-level synthesis technology, including convolution operation unit, pool operation unit, activation function unit, etc. Optimization strategies such as pipeline, dynamic fixed-point quantization, and ping-pong caching are adopted to reduce the use of on-chip and off-chip memory access and storage resources. Finally, two convolutional neural networks with different structures, the LeNet-5 classification network and, the YOLOv2 detection network, are selected for functional verification and performance analysis. The experimental results show that the convolutional neural network FPGA accelerator designed in this paper can provide better performance with fewer resources and power consumption and can efficiently use the hardware resources on the FPGA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)

自引率

0.00%

发文量