基于时空特征的VLAD高效视频检索

2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG) Pub Date : 2013-12-01 DOI:10.1109/NCVPRIPG.2013.6776268

M. K. Reddy, Sahil Arora, R. Venkatesh Babu

{"title":"基于时空特征的VLAD高效视频检索","authors":"M. K. Reddy, Sahil Arora, R. Venkatesh Babu","doi":"10.1109/NCVPRIPG.2013.6776268","DOIUrl":null,"url":null,"abstract":"Compact representation of visual content has emerged as an important topic in the context of large scale image/video retrieval. The recently proposed Vector of Locally Aggregated Descriptors (VLAD) has shown to outperform other existing techniques for retrieval. In this paper, we propose two spatio-temporal features for constructing VLAD vectors for videos in the context of large scale video retrieval. Given a particular query video, our aim is to retrieve similar videos from the database. Experiments are conducted on UCF50 and HMDB51 datasets, which pose challenges in the form of camera motion, view-point variation, large intra-class variation, etc. The paper proposes the following two spatio-temporal features for constructing VLADs i) Local Histogram of Oriented Optical Flow (LHOOF), and ii) Space-Time Invariant Points (STIP). The performance of these proposed features are compared with SIFT based spatial feature. The mean average precision (MAP) indicates the better retrieval performance of the proposed spatio-temporal feature over spatial feature.","PeriodicalId":436402,"journal":{"name":"2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Spatio-temporal feature based VLAD for efficient video retrieval\",\"authors\":\"M. K. Reddy, Sahil Arora, R. Venkatesh Babu\",\"doi\":\"10.1109/NCVPRIPG.2013.6776268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Compact representation of visual content has emerged as an important topic in the context of large scale image/video retrieval. The recently proposed Vector of Locally Aggregated Descriptors (VLAD) has shown to outperform other existing techniques for retrieval. In this paper, we propose two spatio-temporal features for constructing VLAD vectors for videos in the context of large scale video retrieval. Given a particular query video, our aim is to retrieve similar videos from the database. Experiments are conducted on UCF50 and HMDB51 datasets, which pose challenges in the form of camera motion, view-point variation, large intra-class variation, etc. The paper proposes the following two spatio-temporal features for constructing VLADs i) Local Histogram of Oriented Optical Flow (LHOOF), and ii) Space-Time Invariant Points (STIP). The performance of these proposed features are compared with SIFT based spatial feature. The mean average precision (MAP) indicates the better retrieval performance of the proposed spatio-temporal feature over spatial feature.\",\"PeriodicalId\":436402,\"journal\":{\"name\":\"2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCVPRIPG.2013.6776268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCVPRIPG.2013.6776268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

在大规模图像/视频检索的背景下，视觉内容的紧凑表示已经成为一个重要的课题。最近提出的局部聚合描述子向量(VLAD)在检索方面的表现优于其他现有技术。在大规模视频检索的背景下，我们提出了两个时空特征来构建视频的VLAD向量。给定一个特定的查询视频，我们的目标是从数据库中检索相似的视频。在UCF50和HMDB51数据集上进行实验，存在摄像机运动、视点变化、类内变化大等挑战。本文提出了构建vlad的两个时空特征:一是定向光流局部直方图(LHOOF)，二是时空不变点(STIP)。将这些特征的性能与基于SIFT的空间特征进行了比较。平均精度(MAP)表明本文提出的时空特征比空间特征具有更好的检索性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spatio-temporal feature based VLAD for efficient video retrieval

Compact representation of visual content has emerged as an important topic in the context of large scale image/video retrieval. The recently proposed Vector of Locally Aggregated Descriptors (VLAD) has shown to outperform other existing techniques for retrieval. In this paper, we propose two spatio-temporal features for constructing VLAD vectors for videos in the context of large scale video retrieval. Given a particular query video, our aim is to retrieve similar videos from the database. Experiments are conducted on UCF50 and HMDB51 datasets, which pose challenges in the form of camera motion, view-point variation, large intra-class variation, etc. The paper proposes the following two spatio-temporal features for constructing VLADs i) Local Histogram of Oriented Optical Flow (LHOOF), and ii) Space-Time Invariant Points (STIP). The performance of these proposed features are compared with SIFT based spatial feature. The mean average precision (MAP) indicates the better retrieval performance of the proposed spatio-temporal feature over spatial feature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG)

自引率

0.00%

发文量