Real-time video object recognition using convolutional neural network

Byungik Ahn
{"title":"Real-time video object recognition using convolutional neural network","authors":"Byungik Ahn","doi":"10.1109/IJCNN.2015.7280718","DOIUrl":null,"url":null,"abstract":"A convolutional neural network (CNN) is implemented on a field-programmable gate array (FPGA) and used for recognizing objects in real-time video streams. In this system, an image pyramid is constructed by successively down-scaling the input video stream. Image blocks are extracted from the image pyramid and classified by the CNN core. The detected parts are then marked on the output video frames. The CNN core is composed of six hardware neurons and two receptor units. The hardware neurons are designed as fully-pipelined digital circuits synchronized with the system clock, and are used to compute the model neurons in a time-sharing manner. The receptor units scan the input image for local receptive fields and continuously supply data to the hardware neurons as inputs. The CNN core module is controlled according to the contents of a table describing the sequence of computational stages and containing the system parameters required to control each stage. The use of this table makes the hardware system more flexible, and various CNN configurations can be accommodated without re-designing the system. The system implemented on a mid-range FPGA achieves a computational speed greater than 170,000 classifications per second, and performs scale-invariant object recognition from a 720×480 video stream at a speed of 60 fps. This work is a part of a commercial project, and the system is targeted for recognizing any pre-trained objects with a small physical volume and low power consumption.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"92 1","pages":"1-7"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

A convolutional neural network (CNN) is implemented on a field-programmable gate array (FPGA) and used for recognizing objects in real-time video streams. In this system, an image pyramid is constructed by successively down-scaling the input video stream. Image blocks are extracted from the image pyramid and classified by the CNN core. The detected parts are then marked on the output video frames. The CNN core is composed of six hardware neurons and two receptor units. The hardware neurons are designed as fully-pipelined digital circuits synchronized with the system clock, and are used to compute the model neurons in a time-sharing manner. The receptor units scan the input image for local receptive fields and continuously supply data to the hardware neurons as inputs. The CNN core module is controlled according to the contents of a table describing the sequence of computational stages and containing the system parameters required to control each stage. The use of this table makes the hardware system more flexible, and various CNN configurations can be accommodated without re-designing the system. The system implemented on a mid-range FPGA achieves a computational speed greater than 170,000 classifications per second, and performs scale-invariant object recognition from a 720×480 video stream at a speed of 60 fps. This work is a part of a commercial project, and the system is targeted for recognizing any pre-trained objects with a small physical volume and low power consumption.
基于卷积神经网络的实时视频目标识别
在现场可编程门阵列(FPGA)上实现了卷积神经网络(CNN),并将其用于实时视频流中的目标识别。在该系统中,通过对输入视频流的逐次降尺度构建图像金字塔。从图像金字塔中提取图像块,并通过CNN核心进行分类。然后在输出视频帧上标记检测到的部分。CNN核心由6个硬件神经元和2个受体单元组成。硬件神经元被设计成与系统时钟同步的全流水线数字电路,并以分时方式计算模型神经元。受体单元扫描输入图像,寻找局部接受野,并不断向硬件神经元提供数据作为输入。CNN核心模块是根据一个表的内容进行控制的,该表描述了计算阶段的顺序,并包含了控制每个阶段所需的系统参数。该表的使用使硬件系统更加灵活,无需重新设计系统即可容纳各种CNN配置。该系统在中档FPGA上实现,计算速度超过每秒170,000个分类,并以60 fps的速度从720×480视频流中执行缩放不变的目标识别。这项工作是一个商业项目的一部分,该系统的目标是识别任何物理体积小、功耗低的预训练对象。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信