View-invariant feature discovering for multi-camera human action recognition

2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2014-11-20 DOI:10.1109/MMSP.2014.6958807

Hong Lin, L. Chaisorn, Yongkang Wong, Anan Liu, Yuting Su, M. Kankanhalli

{"title":"View-invariant feature discovering for multi-camera human action recognition","authors":"Hong Lin, L. Chaisorn, Yongkang Wong, Anan Liu, Yuting Su, M. Kankanhalli","doi":"10.1109/MMSP.2014.6958807","DOIUrl":null,"url":null,"abstract":"Intelligent video surveillance system is built to automatically detect events of interest, especially on object tracking and behavior understanding. In this paper, we focus on the task of human action recognition under surveillance environment, specifically in a multi-camera monitoring scene. Despite many approaches have achieved success in recognizing human action from video sequences, they are designed for single view and generally not robust against viewpoint invariant. Human action recognition across different views remains challenging due to the large variations from one view to another. We present a framework to solve the problem of transferring action models learned in one view (source view) to another view (target view). First, local space-time interest point feature and global shape-flow feature are extracted as low-level feature, followed by building the hybrid Bag-of-Words model for each action sequence. The data distribution of relevant actions from source view and target view are linked via a cross-view discriminative dictionary learning method. Through the view-adaptive dictionary pair learned by the method, the data from source and target view can be respectively mapped into a common space which is view-invariant. Furthermore, We extend our framework to transfer action models from multiple views to one view when there are multiple source views available. Experiments on the IXMAS human action dataset, which contains videos captured with five viewpoints, show the efficacy of our framework.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2014.6958807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Intelligent video surveillance system is built to automatically detect events of interest, especially on object tracking and behavior understanding. In this paper, we focus on the task of human action recognition under surveillance environment, specifically in a multi-camera monitoring scene. Despite many approaches have achieved success in recognizing human action from video sequences, they are designed for single view and generally not robust against viewpoint invariant. Human action recognition across different views remains challenging due to the large variations from one view to another. We present a framework to solve the problem of transferring action models learned in one view (source view) to another view (target view). First, local space-time interest point feature and global shape-flow feature are extracted as low-level feature, followed by building the hybrid Bag-of-Words model for each action sequence. The data distribution of relevant actions from source view and target view are linked via a cross-view discriminative dictionary learning method. Through the view-adaptive dictionary pair learned by the method, the data from source and target view can be respectively mapped into a common space which is view-invariant. Furthermore, We extend our framework to transfer action models from multiple views to one view when there are multiple source views available. Experiments on the IXMAS human action dataset, which contains videos captured with five viewpoints, show the efficacy of our framework.

查看原文本刊更多论文

多相机人体动作识别的视点不变特征发现

智能视频监控系统是为了自动检测感兴趣的事件而建立的，特别是在目标跟踪和行为理解方面。本文主要研究了监控环境下，特别是多摄像头监控场景下的人体动作识别问题。尽管许多方法在从视频序列中识别人类行为方面取得了成功，但它们都是针对单视图设计的，并且通常对视点不变性不具有鲁棒性。由于从一个视角到另一个视角的巨大差异，跨越不同视角的人类行为识别仍然具有挑战性。我们提出了一个框架来解决从一个视图(源视图)学习到的动作模型转移到另一个视图(目标视图)的问题。首先提取局部时空兴趣点特征和全局形状流特征作为底层特征，然后为每个动作序列构建混合词袋模型;通过跨视图判别字典学习方法将源视图和目标视图中相关动作的数据分布联系起来。通过该方法学习到的视图自适应字典对，可以将源视图和目标视图的数据分别映射到一个视图不变的公共空间中。此外，当有多个可用的源视图时，我们扩展了我们的框架，将操作模型从多个视图转移到一个视图。在IXMAS人类动作数据集上的实验显示了我们的框架的有效性，该数据集包含从五个视点捕获的视频。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)

自引率

0.00%

发文量