Real-time video object recognition using convolutional neural network

2015 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2015-07-12 DOI:10.1109/IJCNN.2015.7280718

Byungik Ahn

{"title":"Real-time video object recognition using convolutional neural network","authors":"Byungik Ahn","doi":"10.1109/IJCNN.2015.7280718","DOIUrl":null,"url":null,"abstract":"A convolutional neural network (CNN) is implemented on a field-programmable gate array (FPGA) and used for recognizing objects in real-time video streams. In this system, an image pyramid is constructed by successively down-scaling the input video stream. Image blocks are extracted from the image pyramid and classified by the CNN core. The detected parts are then marked on the output video frames. The CNN core is composed of six hardware neurons and two receptor units. The hardware neurons are designed as fully-pipelined digital circuits synchronized with the system clock, and are used to compute the model neurons in a time-sharing manner. The receptor units scan the input image for local receptive fields and continuously supply data to the hardware neurons as inputs. The CNN core module is controlled according to the contents of a table describing the sequence of computational stages and containing the system parameters required to control each stage. The use of this table makes the hardware system more flexible, and various CNN configurations can be accommodated without re-designing the system. The system implemented on a mid-range FPGA achieves a computational speed greater than 170,000 classifications per second, and performs scale-invariant object recognition from a 720×480 video stream at a speed of 60 fps. This work is a part of a commercial project, and the system is targeted for recognizing any pre-trained objects with a small physical volume and low power consumption.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"92 1","pages":"1-7"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

A convolutional neural network (CNN) is implemented on a field-programmable gate array (FPGA) and used for recognizing objects in real-time video streams. In this system, an image pyramid is constructed by successively down-scaling the input video stream. Image blocks are extracted from the image pyramid and classified by the CNN core. The detected parts are then marked on the output video frames. The CNN core is composed of six hardware neurons and two receptor units. The hardware neurons are designed as fully-pipelined digital circuits synchronized with the system clock, and are used to compute the model neurons in a time-sharing manner. The receptor units scan the input image for local receptive fields and continuously supply data to the hardware neurons as inputs. The CNN core module is controlled according to the contents of a table describing the sequence of computational stages and containing the system parameters required to control each stage. The use of this table makes the hardware system more flexible, and various CNN configurations can be accommodated without re-designing the system. The system implemented on a mid-range FPGA achieves a computational speed greater than 170,000 classifications per second, and performs scale-invariant object recognition from a 720×480 video stream at a speed of 60 fps. This work is a part of a commercial project, and the system is targeted for recognizing any pre-trained objects with a small physical volume and low power consumption.

查看原文本刊更多论文

基于卷积神经网络的实时视频目标识别

在现场可编程门阵列(FPGA)上实现了卷积神经网络(CNN)，并将其用于实时视频流中的目标识别。在该系统中，通过对输入视频流的逐次降尺度构建图像金字塔。从图像金字塔中提取图像块，并通过CNN核心进行分类。然后在输出视频帧上标记检测到的部分。CNN核心由6个硬件神经元和2个受体单元组成。硬件神经元被设计成与系统时钟同步的全流水线数字电路，并以分时方式计算模型神经元。受体单元扫描输入图像，寻找局部接受野，并不断向硬件神经元提供数据作为输入。CNN核心模块是根据一个表的内容进行控制的，该表描述了计算阶段的顺序，并包含了控制每个阶段所需的系统参数。该表的使用使硬件系统更加灵活，无需重新设计系统即可容纳各种CNN配置。该系统在中档FPGA上实现，计算速度超过每秒170,000个分类，并以60 fps的速度从720×480视频流中执行缩放不变的目标识别。这项工作是一个商业项目的一部分，该系统的目标是识别任何物理体积小、功耗低的预训练对象。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量