A Real-time Image Processing Hardware Acceleration Method based on FPGA

2021 6th International Conference on Computational Intelligence and Applications (ICCIA) Pub Date : 2021-06-01 DOI:10.1109/ICCIA52886.2021.00046

Haiying Yuan, Dong Ding, Zhongwei Fan, Zengyang Sun

{"title":"A Real-time Image Processing Hardware Acceleration Method based on FPGA","authors":"Haiying Yuan, Dong Ding, Zhongwei Fan, Zengyang Sun","doi":"10.1109/ICCIA52886.2021.00046","DOIUrl":null,"url":null,"abstract":"Real-time image sensed by the visual sensor usually contains a lot of noise information. Model reasoning, and pattern recognition-oriented CNNs face such thorny issues as excessive computation, poor accuracy and high resource occupancy. Hence, CNN architecture was heterogeneously deployed on the Zynq platform to realize hardware acceleration for the image processing algorithm. MNIST dataset was adopted to train CNN for extracting network parameters on PC terminal under the Caffe framework; the convolutional layer responsible for heavy computational load was deployed onto FPGA for parallel computing to increase system speed; input layer and output layer responsible for a small amount computation were placed on ARM terminal to reduce resource consumption; real-time image acquired by the camera was binarized to highlight image features and improve the recognition accuracy; the hardware acceleration performance of the heterogeneously deployed CNN was verified with numerous experiments on image recognition of handwritten numerals. Experimental results indicated that: CNN hardware accelerator kept an image recognition accuracy up to 99.02% which is largely equivalent to that of client PC; When recognizing a single piece of handwritten numerical sample, under the use of optimized instructions and 100MHz clock frequency, the recognition time of a single image is 0.53s, which is 16 times faster than pure ARM operation; the maximum power consumption of the system is 2.606W, which is far Lower than general-purpose processors.","PeriodicalId":269269,"journal":{"name":"2021 6th International Conference on Computational Intelligence and Applications (ICCIA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computational Intelligence and Applications (ICCIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIA52886.2021.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Real-time image sensed by the visual sensor usually contains a lot of noise information. Model reasoning, and pattern recognition-oriented CNNs face such thorny issues as excessive computation, poor accuracy and high resource occupancy. Hence, CNN architecture was heterogeneously deployed on the Zynq platform to realize hardware acceleration for the image processing algorithm. MNIST dataset was adopted to train CNN for extracting network parameters on PC terminal under the Caffe framework; the convolutional layer responsible for heavy computational load was deployed onto FPGA for parallel computing to increase system speed; input layer and output layer responsible for a small amount computation were placed on ARM terminal to reduce resource consumption; real-time image acquired by the camera was binarized to highlight image features and improve the recognition accuracy; the hardware acceleration performance of the heterogeneously deployed CNN was verified with numerous experiments on image recognition of handwritten numerals. Experimental results indicated that: CNN hardware accelerator kept an image recognition accuracy up to 99.02% which is largely equivalent to that of client PC; When recognizing a single piece of handwritten numerical sample, under the use of optimized instructions and 100MHz clock frequency, the recognition time of a single image is 0.53s, which is 16 times faster than pure ARM operation; the maximum power consumption of the system is 2.606W, which is far Lower than general-purpose processors.

查看原文本刊更多论文

基于FPGA的实时图像处理硬件加速方法

视觉传感器所感知的实时图像通常包含大量的噪声信息。以模型推理和模式识别为导向的cnn面临着计算量大、准确率差、资源占用高等棘手问题。因此，我们在Zynq平台上异构部署CNN架构，实现图像处理算法的硬件加速。采用MNIST数据集训练CNN，在Caffe框架下提取PC端网络参数;将计算量大的卷积层部署在FPGA上进行并行计算，提高系统速度;负责少量计算的输入层和输出层放置在ARM终端上，减少资源消耗;对摄像机采集的实时图像进行二值化处理，突出图像特征，提高识别精度;通过大量手写体数字图像识别实验，验证了异构部署CNN的硬件加速性能。实验结果表明:CNN硬件加速器的图像识别准确率高达99.02%，与客户端PC基本相当;在识别单张手写数字样本时，在优化指令和100MHz时钟频率下，单张图像的识别时间为0.53s，比纯ARM操作快16倍;系统的最大功耗为2.606W，远低于通用处理器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 6th International Conference on Computational Intelligence and Applications (ICCIA)

自引率

0.00%

发文量