Semi-Binary Based Video Features for Activity Representation

2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2013-12-23 DOI:10.1109/DICTA.2013.6691527

Sabanadesan Umakanthan, S. Denman, C. Fookes, S. Sridharan

{"title":"Semi-Binary Based Video Features for Activity Representation","authors":"Sabanadesan Umakanthan, S. Denman, C. Fookes, S. Sridharan","doi":"10.1109/DICTA.2013.6691527","DOIUrl":null,"url":null,"abstract":"Efficient and effective feature detection and representation is an important consideration when processing videos, and a large number of applications such as motion analysis, 3D scene understanding, tracking etc depend on this. Amongst several feature description methods, local features are becoming increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational complexity, their performance is still too limited for real world applications. Furthermore, rapid increases in the uptake of mobile devices has increased the demand for algorithms that can run with reduced memory and computational requirements. In this paper we propose a semi binary based feature detector-descriptor based on the BRISK detector, which can detect and represent videos with significantly reduced computational requirements, while achieving comparable performance to the state of the art spatio- temporal feature descriptors. First, the BRISK feature detector is applied on a frame by frame basis to detect interest points, then the detected key points are compared against consecutive frames for significant motion. Key points with significant motion are encoded with the BRISK descriptor in the spatial domain and Motion Boundary Histogram in the temporal domain. This descriptor is not only lightweight but also has lower memory requirements because of the binary nature of the BRISK descriptor, allowing the possibility of applications using hand held devices. We evaluate the combination of detector-descriptor performance in the context of action classification with a standard, popular bag-of-features with SVM framework. Experiments are carried out on two popular datasets with varying complexity and we demonstrate comparable performance with other descriptors with reduced computational complexity.","PeriodicalId":231632,"journal":{"name":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA.2013.6691527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Efficient and effective feature detection and representation is an important consideration when processing videos, and a large number of applications such as motion analysis, 3D scene understanding, tracking etc depend on this. Amongst several feature description methods, local features are becoming increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational complexity, their performance is still too limited for real world applications. Furthermore, rapid increases in the uptake of mobile devices has increased the demand for algorithms that can run with reduced memory and computational requirements. In this paper we propose a semi binary based feature detector-descriptor based on the BRISK detector, which can detect and represent videos with significantly reduced computational requirements, while achieving comparable performance to the state of the art spatio- temporal feature descriptors. First, the BRISK feature detector is applied on a frame by frame basis to detect interest points, then the detected key points are compared against consecutive frames for significant motion. Key points with significant motion are encoded with the BRISK descriptor in the spatial domain and Motion Boundary Histogram in the temporal domain. This descriptor is not only lightweight but also has lower memory requirements because of the binary nature of the BRISK descriptor, allowing the possibility of applications using hand held devices. We evaluate the combination of detector-descriptor performance in the context of action classification with a standard, popular bag-of-features with SVM framework. Experiments are carried out on two popular datasets with varying complexity and we demonstrate comparable performance with other descriptors with reduced computational complexity.

查看原文本刊更多论文

基于半二进制的活动表示视频特征

高效和有效的特征检测和表示是处理视频时的一个重要考虑因素，大量的应用，如运动分析，3D场景理解，跟踪等都依赖于此。在众多特征描述方法中，局部特征以其简单、高效的特点越来越受到人们的青睐。虽然它们以较低的计算复杂度实现了最先进的性能，但它们的性能对于现实世界的应用来说仍然太有限。此外，移动设备的快速增长增加了对可以在减少内存和计算需求的情况下运行的算法的需求。在本文中，我们提出了一种基于BRISK检测器的半二进制特征检测器-描述符，它可以在显著降低计算需求的情况下检测和表示视频，同时达到与最先进的时空特征描述符相当的性能。首先，在逐帧的基础上应用BRISK特征检测器检测兴趣点，然后将检测到的关键点与连续帧进行比较，以获得重要的运动。在空间域用轻快描述符编码，在时间域用运动边界直方图编码。这个描述符不仅轻量级，而且由于BRISK描述符的二进制特性，它的内存需求也较低，允许应用程序使用手持设备。我们用一个标准的、流行的SVM特征袋框架来评估动作分类背景下检测器-描述符性能的组合。在两个流行的具有不同复杂性的数据集上进行了实验，我们证明了与其他计算复杂性降低的描述符相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

自引率

0.00%

发文量