基于FPGA的实时图像处理硬件加速方法

Haiying Yuan, Dong Ding, Zhongwei Fan, Zengyang Sun
{"title":"基于FPGA的实时图像处理硬件加速方法","authors":"Haiying Yuan, Dong Ding, Zhongwei Fan, Zengyang Sun","doi":"10.1109/ICCIA52886.2021.00046","DOIUrl":null,"url":null,"abstract":"Real-time image sensed by the visual sensor usually contains a lot of noise information. Model reasoning, and pattern recognition-oriented CNNs face such thorny issues as excessive computation, poor accuracy and high resource occupancy. Hence, CNN architecture was heterogeneously deployed on the Zynq platform to realize hardware acceleration for the image processing algorithm. MNIST dataset was adopted to train CNN for extracting network parameters on PC terminal under the Caffe framework; the convolutional layer responsible for heavy computational load was deployed onto FPGA for parallel computing to increase system speed; input layer and output layer responsible for a small amount computation were placed on ARM terminal to reduce resource consumption; real-time image acquired by the camera was binarized to highlight image features and improve the recognition accuracy; the hardware acceleration performance of the heterogeneously deployed CNN was verified with numerous experiments on image recognition of handwritten numerals. Experimental results indicated that: CNN hardware accelerator kept an image recognition accuracy up to 99.02% which is largely equivalent to that of client PC; When recognizing a single piece of handwritten numerical sample, under the use of optimized instructions and 100MHz clock frequency, the recognition time of a single image is 0.53s, which is 16 times faster than pure ARM operation; the maximum power consumption of the system is 2.606W, which is far Lower than general-purpose processors.","PeriodicalId":269269,"journal":{"name":"2021 6th International Conference on Computational Intelligence and Applications (ICCIA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Real-time Image Processing Hardware Acceleration Method based on FPGA\",\"authors\":\"Haiying Yuan, Dong Ding, Zhongwei Fan, Zengyang Sun\",\"doi\":\"10.1109/ICCIA52886.2021.00046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real-time image sensed by the visual sensor usually contains a lot of noise information. Model reasoning, and pattern recognition-oriented CNNs face such thorny issues as excessive computation, poor accuracy and high resource occupancy. Hence, CNN architecture was heterogeneously deployed on the Zynq platform to realize hardware acceleration for the image processing algorithm. MNIST dataset was adopted to train CNN for extracting network parameters on PC terminal under the Caffe framework; the convolutional layer responsible for heavy computational load was deployed onto FPGA for parallel computing to increase system speed; input layer and output layer responsible for a small amount computation were placed on ARM terminal to reduce resource consumption; real-time image acquired by the camera was binarized to highlight image features and improve the recognition accuracy; the hardware acceleration performance of the heterogeneously deployed CNN was verified with numerous experiments on image recognition of handwritten numerals. Experimental results indicated that: CNN hardware accelerator kept an image recognition accuracy up to 99.02% which is largely equivalent to that of client PC; When recognizing a single piece of handwritten numerical sample, under the use of optimized instructions and 100MHz clock frequency, the recognition time of a single image is 0.53s, which is 16 times faster than pure ARM operation; the maximum power consumption of the system is 2.606W, which is far Lower than general-purpose processors.\",\"PeriodicalId\":269269,\"journal\":{\"name\":\"2021 6th International Conference on Computational Intelligence and Applications (ICCIA)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Computational Intelligence and Applications (ICCIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIA52886.2021.00046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computational Intelligence and Applications (ICCIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIA52886.2021.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

视觉传感器所感知的实时图像通常包含大量的噪声信息。以模型推理和模式识别为导向的cnn面临着计算量大、准确率差、资源占用高等棘手问题。因此,我们在Zynq平台上异构部署CNN架构,实现图像处理算法的硬件加速。采用MNIST数据集训练CNN,在Caffe框架下提取PC端网络参数;将计算量大的卷积层部署在FPGA上进行并行计算,提高系统速度;负责少量计算的输入层和输出层放置在ARM终端上,减少资源消耗;对摄像机采集的实时图像进行二值化处理,突出图像特征,提高识别精度;通过大量手写体数字图像识别实验,验证了异构部署CNN的硬件加速性能。实验结果表明:CNN硬件加速器的图像识别准确率高达99.02%,与客户端PC基本相当;在识别单张手写数字样本时,在优化指令和100MHz时钟频率下,单张图像的识别时间为0.53s,比纯ARM操作快16倍;系统的最大功耗为2.606W,远低于通用处理器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Real-time Image Processing Hardware Acceleration Method based on FPGA
Real-time image sensed by the visual sensor usually contains a lot of noise information. Model reasoning, and pattern recognition-oriented CNNs face such thorny issues as excessive computation, poor accuracy and high resource occupancy. Hence, CNN architecture was heterogeneously deployed on the Zynq platform to realize hardware acceleration for the image processing algorithm. MNIST dataset was adopted to train CNN for extracting network parameters on PC terminal under the Caffe framework; the convolutional layer responsible for heavy computational load was deployed onto FPGA for parallel computing to increase system speed; input layer and output layer responsible for a small amount computation were placed on ARM terminal to reduce resource consumption; real-time image acquired by the camera was binarized to highlight image features and improve the recognition accuracy; the hardware acceleration performance of the heterogeneously deployed CNN was verified with numerous experiments on image recognition of handwritten numerals. Experimental results indicated that: CNN hardware accelerator kept an image recognition accuracy up to 99.02% which is largely equivalent to that of client PC; When recognizing a single piece of handwritten numerical sample, under the use of optimized instructions and 100MHz clock frequency, the recognition time of a single image is 0.53s, which is 16 times faster than pure ARM operation; the maximum power consumption of the system is 2.606W, which is far Lower than general-purpose processors.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信