Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval Pub Date : 2018-06-05 DOI:10.1145/3206025.3206083

Noa García

引用次数: 9

Abstract

In this research we study the specific task of image-to-video retrieval, in which static pictures are used to find a specific timestamp or frame within a collection of videos. The inner temporal structure of video data consists of a sequence of highly correlated images or frames, commonly reproduced at rates of 24 to 30 frames per second. To perform large-scale retrieval, it is necessary to reduce the amount of data to be processed by exploiting the redundancy between these highly correlated images. In this work, we explore several techniques to aggregate visual temporal information from video data based on both standard local features and deep learning representations with the focus on the image-to-video retrieval task.

查看原文本刊更多论文

面向大规模图像到视频检索的视觉特征时间聚合

在本研究中，我们研究了图像到视频检索的具体任务，其中使用静态图片来查找视频集合中的特定时间戳或帧。视频数据的内部时间结构由一系列高度相关的图像或帧组成，通常以每秒24至30帧的速率再现。为了进行大规模检索，有必要利用这些高度相关图像之间的冗余来减少需要处理的数据量。在这项工作中，我们探索了几种基于标准局部特征和深度学习表示从视频数据中聚合视觉时间信息的技术，重点是图像到视频的检索任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

自引率

0.00%

发文量