View-invariant feature discovering for multi-camera human action recognition

Hong Lin, L. Chaisorn, Yongkang Wong, Anan Liu, Yuting Su, M. Kankanhalli
{"title":"View-invariant feature discovering for multi-camera human action recognition","authors":"Hong Lin, L. Chaisorn, Yongkang Wong, Anan Liu, Yuting Su, M. Kankanhalli","doi":"10.1109/MMSP.2014.6958807","DOIUrl":null,"url":null,"abstract":"Intelligent video surveillance system is built to automatically detect events of interest, especially on object tracking and behavior understanding. In this paper, we focus on the task of human action recognition under surveillance environment, specifically in a multi-camera monitoring scene. Despite many approaches have achieved success in recognizing human action from video sequences, they are designed for single view and generally not robust against viewpoint invariant. Human action recognition across different views remains challenging due to the large variations from one view to another. We present a framework to solve the problem of transferring action models learned in one view (source view) to another view (target view). First, local space-time interest point feature and global shape-flow feature are extracted as low-level feature, followed by building the hybrid Bag-of-Words model for each action sequence. The data distribution of relevant actions from source view and target view are linked via a cross-view discriminative dictionary learning method. Through the view-adaptive dictionary pair learned by the method, the data from source and target view can be respectively mapped into a common space which is view-invariant. Furthermore, We extend our framework to transfer action models from multiple views to one view when there are multiple source views available. Experiments on the IXMAS human action dataset, which contains videos captured with five viewpoints, show the efficacy of our framework.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2014.6958807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Intelligent video surveillance system is built to automatically detect events of interest, especially on object tracking and behavior understanding. In this paper, we focus on the task of human action recognition under surveillance environment, specifically in a multi-camera monitoring scene. Despite many approaches have achieved success in recognizing human action from video sequences, they are designed for single view and generally not robust against viewpoint invariant. Human action recognition across different views remains challenging due to the large variations from one view to another. We present a framework to solve the problem of transferring action models learned in one view (source view) to another view (target view). First, local space-time interest point feature and global shape-flow feature are extracted as low-level feature, followed by building the hybrid Bag-of-Words model for each action sequence. The data distribution of relevant actions from source view and target view are linked via a cross-view discriminative dictionary learning method. Through the view-adaptive dictionary pair learned by the method, the data from source and target view can be respectively mapped into a common space which is view-invariant. Furthermore, We extend our framework to transfer action models from multiple views to one view when there are multiple source views available. Experiments on the IXMAS human action dataset, which contains videos captured with five viewpoints, show the efficacy of our framework.
多相机人体动作识别的视点不变特征发现
智能视频监控系统是为了自动检测感兴趣的事件而建立的,特别是在目标跟踪和行为理解方面。本文主要研究了监控环境下,特别是多摄像头监控场景下的人体动作识别问题。尽管许多方法在从视频序列中识别人类行为方面取得了成功,但它们都是针对单视图设计的,并且通常对视点不变性不具有鲁棒性。由于从一个视角到另一个视角的巨大差异,跨越不同视角的人类行为识别仍然具有挑战性。我们提出了一个框架来解决从一个视图(源视图)学习到的动作模型转移到另一个视图(目标视图)的问题。首先提取局部时空兴趣点特征和全局形状流特征作为底层特征,然后为每个动作序列构建混合词袋模型;通过跨视图判别字典学习方法将源视图和目标视图中相关动作的数据分布联系起来。通过该方法学习到的视图自适应字典对,可以将源视图和目标视图的数据分别映射到一个视图不变的公共空间中。此外,当有多个可用的源视图时,我们扩展了我们的框架,将操作模型从多个视图转移到一个视图。在IXMAS人类动作数据集上的实验显示了我们的框架的有效性,该数据集包含从五个视点捕获的视频。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信