Improving codebook generation for action recognition using a mixture of Asymmetric Gaussians

2014 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP) Pub Date : 2014-12-01 DOI:10.1109/CIMSIVP.2014.7013267

Tarek Elguebaly, N. Bouguila

{"title":"Improving codebook generation for action recognition using a mixture of Asymmetric Gaussians","authors":"Tarek Elguebaly, N. Bouguila","doi":"10.1109/CIMSIVP.2014.7013267","DOIUrl":null,"url":null,"abstract":"Human activity recognition is a crucial area of computer vision research and applications. The goal of human activity recognition aims to automatically analyze and interpret ongoing events and their context from video data. Recently, the bag of visual words (BoVW) approach has been widely applied for human action recognition. Generally, a representative corpus of videos is used to build the Visual Words dictionary or codebook using a simple k-means clustering approach. This visual dictionary is then used to quantize the extracted features by simply assigning the label of the closest cluster centroid using Euclidean distance between the cluster centers and the input descriptor. Thus, each video can be represented as a frequency histogram over visual words. However, the BoVW approach has several limitations such as its need for a predefined codebook size, dependence on the chosen set of visual words, and the use of hard assignment clustering for histogram creation. In this paper, we are trying to overcome these issues by using a mixture of Asymmetric Gaussians to build the codebook. Our method is able to identify the best size for our dictionary in an unsupervised manner, to represent the set of input feature vectors by an estimate of their density distribution, and to allow soft assignments. Furthermore, we validate the efficiency of the proposed algorithm for human action recognition.","PeriodicalId":210556,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIMSIVP.2014.7013267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Human activity recognition is a crucial area of computer vision research and applications. The goal of human activity recognition aims to automatically analyze and interpret ongoing events and their context from video data. Recently, the bag of visual words (BoVW) approach has been widely applied for human action recognition. Generally, a representative corpus of videos is used to build the Visual Words dictionary or codebook using a simple k-means clustering approach. This visual dictionary is then used to quantize the extracted features by simply assigning the label of the closest cluster centroid using Euclidean distance between the cluster centers and the input descriptor. Thus, each video can be represented as a frequency histogram over visual words. However, the BoVW approach has several limitations such as its need for a predefined codebook size, dependence on the chosen set of visual words, and the use of hard assignment clustering for histogram creation. In this paper, we are trying to overcome these issues by using a mixture of Asymmetric Gaussians to build the codebook. Our method is able to identify the best size for our dictionary in an unsupervised manner, to represent the set of input feature vectors by an estimate of their density distribution, and to allow soft assignments. Furthermore, we validate the efficiency of the proposed algorithm for human action recognition.

查看原文本刊更多论文

使用非对称高斯混合改进动作识别的码本生成

人体活动识别是计算机视觉研究和应用的一个重要领域。人类活动识别的目标是从视频数据中自动分析和解释正在发生的事件及其背景。近年来，视觉词包(BoVW)方法在人体动作识别中得到了广泛的应用。通常，使用简单的k-means聚类方法，使用具有代表性的视频语料库来构建Visual Words字典或码本。然后使用这个可视化字典来量化提取的特征，方法是使用聚类中心和输入描述符之间的欧几里得距离简单地分配最近的聚类质心的标签。因此，每个视频都可以表示为视觉词的频率直方图。然而，BoVW方法有一些限制，比如它需要预定义的码本大小，依赖于所选的视觉词集，以及使用硬分配聚类来创建直方图。在本文中，我们试图通过使用非对称高斯混合来构建密码本来克服这些问题。我们的方法能够以无监督的方式识别字典的最佳大小，通过估计输入特征向量的密度分布来表示输入特征向量集，并允许软分配。此外，我们验证了该算法在人体动作识别方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP)

自引率

0.00%

发文量