Video Object Detection Using Motion Context and Feature Aggregation

2020 International Conference on Information and Communication Technology Convergence (ICTC) Pub Date : 2020-10-21 DOI:10.1109/ICTC49870.2020.9289386

Jaekyum Kim, Junho Koh, J. Choi

引用次数: 0

Abstract

The deep learning technique has recently led to significant improvement in object-detection accuracy. Numerous object detection schemes have been designed to process each frame independently. However, in many applications, object detection is performed using video data, which consists of a sequence of image frames. Thus, the object detection accuracy can be improved by exploiting the temporal context of the video sequence. In this paper, we propose a novel video object detection method that exploits both the motion context of the object and spatio-temporal aggregated features to enhance the video object detection performance. First, the motion context of the object is extracted by the correlation operator between the feature maps of two adjacent frames. In addition to generating the motion context, the spatial feature maps for N adjacent frames are aggregated to boost the quality of the feature map with gated attention network.

查看原文本刊更多论文

基于运动上下文和特征聚合的视频目标检测

最近，深度学习技术显著提高了目标检测的准确性。许多目标检测方案被设计用来独立处理每一帧。然而，在许多应用中，目标检测是使用视频数据执行的，视频数据由一系列图像帧组成。因此，可以通过利用视频序列的时间上下文来提高目标检测精度。本文提出了一种新的视频目标检测方法，该方法利用目标的运动背景和时空聚合特征来提高视频目标的检测性能。首先，利用相邻两帧特征映射之间的相关算子提取目标的运动上下文;除了生成运动上下文外，还聚合了N个相邻帧的空间特征映射，通过门控注意网络提高了特征映射的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Information and Communication Technology Convergence (ICTC)

自引率

0.00%

发文量