在线、实时跟踪和识别人类行为

2008 IEEE Workshop on Motion and video Computing Pub Date : 2008-01-08 DOI:10.1109/WMVC.2008.4544064

Pradeep Natarajan, R. Nevatia

{"title":"在线、实时跟踪和识别人类行为","authors":"Pradeep Natarajan, R. Nevatia","doi":"10.1109/WMVC.2008.4544064","DOIUrl":null,"url":null,"abstract":"We present a top-down approach to simultaneously track and recognize articulated full-body human motion using learned action models that is robust to variations in style, lighting, background,occlusion and viewpoint. To this end, we introduce the hierarchical variable transition hidden Markov model (HVT-HMM) that is a three-layered extension of the variable transition hidden Markov model (VTHMM). The top-most layer of the HVT-HMM represents the composite actions and contains a single Markov chain, the middle layer represents the primitive actions which are modeled using a VTHMM whose state transition probability varies with time and the bottom-most layer represents the body pose transitions using a HMM. We represent the pose using a 23D body model and present efficient learning and decoding algorithms for HVT-HMM. Further, in classical Viterbi decoding the entire sequence must be seen before the state at any instant can be recognized and hence can potentially have large latency for long video sequences. In order to address this we use a variable window approach to decoding with very low latency. We demonstrate our methods first in a domain for recognizing two-handed gestures and then in a domain with actions involving articulated motion of the entire body. Our approach shows 90-100% action recognition in both domains and runs at real-time (ap 30 fps) with very low average latency (ap 2 frames).","PeriodicalId":150666,"journal":{"name":"2008 IEEE Workshop on Motion and video Computing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":"{\"title\":\"Online, Real-time Tracking and Recognition of Human Actions\",\"authors\":\"Pradeep Natarajan, R. Nevatia\",\"doi\":\"10.1109/WMVC.2008.4544064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a top-down approach to simultaneously track and recognize articulated full-body human motion using learned action models that is robust to variations in style, lighting, background,occlusion and viewpoint. To this end, we introduce the hierarchical variable transition hidden Markov model (HVT-HMM) that is a three-layered extension of the variable transition hidden Markov model (VTHMM). The top-most layer of the HVT-HMM represents the composite actions and contains a single Markov chain, the middle layer represents the primitive actions which are modeled using a VTHMM whose state transition probability varies with time and the bottom-most layer represents the body pose transitions using a HMM. We represent the pose using a 23D body model and present efficient learning and decoding algorithms for HVT-HMM. Further, in classical Viterbi decoding the entire sequence must be seen before the state at any instant can be recognized and hence can potentially have large latency for long video sequences. In order to address this we use a variable window approach to decoding with very low latency. We demonstrate our methods first in a domain for recognizing two-handed gestures and then in a domain with actions involving articulated motion of the entire body. Our approach shows 90-100% action recognition in both domains and runs at real-time (ap 30 fps) with very low average latency (ap 2 frames).\",\"PeriodicalId\":150666,\"journal\":{\"name\":\"2008 IEEE Workshop on Motion and video Computing\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"58\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE Workshop on Motion and video Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WMVC.2008.4544064\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE Workshop on Motion and video Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WMVC.2008.4544064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

摘要

我们提出了一种自上而下的方法，使用学习的动作模型同时跟踪和识别关节全身人体运动，该模型对风格、照明、背景、遮挡和视点的变化具有鲁棒性。为此，我们引入了层次变量转移隐马尔可夫模型(HVT-HMM)，它是变量转移隐马尔可夫模型(VTHMM)的三层扩展。HVT-HMM的最上层表示复合动作并包含单个马尔可夫链，中间层表示使用状态转移概率随时间变化的VTHMM建模的原始动作，最底层表示使用HMM的身体姿势转换。我们使用23D身体模型来表示姿态，并提出了HVT-HMM的高效学习和解码算法。此外，在经典的Viterbi解码中，必须先看到整个序列，然后才能识别任何时刻的状态，因此对于长视频序列可能有很大的延迟。为了解决这个问题，我们使用可变窗口方法进行解码，延迟非常低。我们首先在识别双手手势的领域展示了我们的方法，然后在涉及整个身体关节运动的动作领域展示了我们的方法。我们的方法在两个领域都显示了90-100%的动作识别，并且实时运行(ap 30 fps)，平均延迟(ap 2帧)非常低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Online, Real-time Tracking and Recognition of Human Actions

We present a top-down approach to simultaneously track and recognize articulated full-body human motion using learned action models that is robust to variations in style, lighting, background,occlusion and viewpoint. To this end, we introduce the hierarchical variable transition hidden Markov model (HVT-HMM) that is a three-layered extension of the variable transition hidden Markov model (VTHMM). The top-most layer of the HVT-HMM represents the composite actions and contains a single Markov chain, the middle layer represents the primitive actions which are modeled using a VTHMM whose state transition probability varies with time and the bottom-most layer represents the body pose transitions using a HMM. We represent the pose using a 23D body model and present efficient learning and decoding algorithms for HVT-HMM. Further, in classical Viterbi decoding the entire sequence must be seen before the state at any instant can be recognized and hence can potentially have large latency for long video sequences. In order to address this we use a variable window approach to decoding with very low latency. We demonstrate our methods first in a domain for recognizing two-handed gestures and then in a domain with actions involving articulated motion of the entire body. Our approach shows 90-100% action recognition in both domains and runs at real-time (ap 30 fps) with very low average latency (ap 2 frames).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE Workshop on Motion and video Computing

自引率

0.00%

发文量