{"title":"基于卷积神经网络的实时视频目标识别","authors":"Byungik Ahn","doi":"10.1109/IJCNN.2015.7280718","DOIUrl":null,"url":null,"abstract":"A convolutional neural network (CNN) is implemented on a field-programmable gate array (FPGA) and used for recognizing objects in real-time video streams. In this system, an image pyramid is constructed by successively down-scaling the input video stream. Image blocks are extracted from the image pyramid and classified by the CNN core. The detected parts are then marked on the output video frames. The CNN core is composed of six hardware neurons and two receptor units. The hardware neurons are designed as fully-pipelined digital circuits synchronized with the system clock, and are used to compute the model neurons in a time-sharing manner. The receptor units scan the input image for local receptive fields and continuously supply data to the hardware neurons as inputs. The CNN core module is controlled according to the contents of a table describing the sequence of computational stages and containing the system parameters required to control each stage. The use of this table makes the hardware system more flexible, and various CNN configurations can be accommodated without re-designing the system. The system implemented on a mid-range FPGA achieves a computational speed greater than 170,000 classifications per second, and performs scale-invariant object recognition from a 720×480 video stream at a speed of 60 fps. This work is a part of a commercial project, and the system is targeted for recognizing any pre-trained objects with a small physical volume and low power consumption.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"92 1","pages":"1-7"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Real-time video object recognition using convolutional neural network\",\"authors\":\"Byungik Ahn\",\"doi\":\"10.1109/IJCNN.2015.7280718\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A convolutional neural network (CNN) is implemented on a field-programmable gate array (FPGA) and used for recognizing objects in real-time video streams. In this system, an image pyramid is constructed by successively down-scaling the input video stream. Image blocks are extracted from the image pyramid and classified by the CNN core. The detected parts are then marked on the output video frames. The CNN core is composed of six hardware neurons and two receptor units. The hardware neurons are designed as fully-pipelined digital circuits synchronized with the system clock, and are used to compute the model neurons in a time-sharing manner. The receptor units scan the input image for local receptive fields and continuously supply data to the hardware neurons as inputs. The CNN core module is controlled according to the contents of a table describing the sequence of computational stages and containing the system parameters required to control each stage. The use of this table makes the hardware system more flexible, and various CNN configurations can be accommodated without re-designing the system. The system implemented on a mid-range FPGA achieves a computational speed greater than 170,000 classifications per second, and performs scale-invariant object recognition from a 720×480 video stream at a speed of 60 fps. This work is a part of a commercial project, and the system is targeted for recognizing any pre-trained objects with a small physical volume and low power consumption.\",\"PeriodicalId\":6539,\"journal\":{\"name\":\"2015 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"92 1\",\"pages\":\"1-7\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.2015.7280718\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Real-time video object recognition using convolutional neural network
A convolutional neural network (CNN) is implemented on a field-programmable gate array (FPGA) and used for recognizing objects in real-time video streams. In this system, an image pyramid is constructed by successively down-scaling the input video stream. Image blocks are extracted from the image pyramid and classified by the CNN core. The detected parts are then marked on the output video frames. The CNN core is composed of six hardware neurons and two receptor units. The hardware neurons are designed as fully-pipelined digital circuits synchronized with the system clock, and are used to compute the model neurons in a time-sharing manner. The receptor units scan the input image for local receptive fields and continuously supply data to the hardware neurons as inputs. The CNN core module is controlled according to the contents of a table describing the sequence of computational stages and containing the system parameters required to control each stage. The use of this table makes the hardware system more flexible, and various CNN configurations can be accommodated without re-designing the system. The system implemented on a mid-range FPGA achieves a computational speed greater than 170,000 classifications per second, and performs scale-invariant object recognition from a 720×480 video stream at a speed of 60 fps. This work is a part of a commercial project, and the system is targeted for recognizing any pre-trained objects with a small physical volume and low power consumption.