Haiying Yuan, Dong Ding, Zhongwei Fan, Zengyang Sun
{"title":"A Real-time Image Processing Hardware Acceleration Method based on FPGA","authors":"Haiying Yuan, Dong Ding, Zhongwei Fan, Zengyang Sun","doi":"10.1109/ICCIA52886.2021.00046","DOIUrl":null,"url":null,"abstract":"Real-time image sensed by the visual sensor usually contains a lot of noise information. Model reasoning, and pattern recognition-oriented CNNs face such thorny issues as excessive computation, poor accuracy and high resource occupancy. Hence, CNN architecture was heterogeneously deployed on the Zynq platform to realize hardware acceleration for the image processing algorithm. MNIST dataset was adopted to train CNN for extracting network parameters on PC terminal under the Caffe framework; the convolutional layer responsible for heavy computational load was deployed onto FPGA for parallel computing to increase system speed; input layer and output layer responsible for a small amount computation were placed on ARM terminal to reduce resource consumption; real-time image acquired by the camera was binarized to highlight image features and improve the recognition accuracy; the hardware acceleration performance of the heterogeneously deployed CNN was verified with numerous experiments on image recognition of handwritten numerals. Experimental results indicated that: CNN hardware accelerator kept an image recognition accuracy up to 99.02% which is largely equivalent to that of client PC; When recognizing a single piece of handwritten numerical sample, under the use of optimized instructions and 100MHz clock frequency, the recognition time of a single image is 0.53s, which is 16 times faster than pure ARM operation; the maximum power consumption of the system is 2.606W, which is far Lower than general-purpose processors.","PeriodicalId":269269,"journal":{"name":"2021 6th International Conference on Computational Intelligence and Applications (ICCIA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computational Intelligence and Applications (ICCIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIA52886.2021.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Real-time image sensed by the visual sensor usually contains a lot of noise information. Model reasoning, and pattern recognition-oriented CNNs face such thorny issues as excessive computation, poor accuracy and high resource occupancy. Hence, CNN architecture was heterogeneously deployed on the Zynq platform to realize hardware acceleration for the image processing algorithm. MNIST dataset was adopted to train CNN for extracting network parameters on PC terminal under the Caffe framework; the convolutional layer responsible for heavy computational load was deployed onto FPGA for parallel computing to increase system speed; input layer and output layer responsible for a small amount computation were placed on ARM terminal to reduce resource consumption; real-time image acquired by the camera was binarized to highlight image features and improve the recognition accuracy; the hardware acceleration performance of the heterogeneously deployed CNN was verified with numerous experiments on image recognition of handwritten numerals. Experimental results indicated that: CNN hardware accelerator kept an image recognition accuracy up to 99.02% which is largely equivalent to that of client PC; When recognizing a single piece of handwritten numerical sample, under the use of optimized instructions and 100MHz clock frequency, the recognition time of a single image is 0.53s, which is 16 times faster than pure ARM operation; the maximum power consumption of the system is 2.606W, which is far Lower than general-purpose processors.