Energy-Based Global Ternary Image for Action Recognition Using Sole Depth Sequences

2016 Fourth International Conference on 3D Vision (3DV) Pub Date : 2016-10-01 DOI:10.1109/3DV.2016.14

Mengyuan Liu, Hong Liu, Chen Chen, M. Najafian

{"title":"Energy-Based Global Ternary Image for Action Recognition Using Sole Depth Sequences","authors":"Mengyuan Liu, Hong Liu, Chen Chen, M. Najafian","doi":"10.1109/3DV.2016.14","DOIUrl":null,"url":null,"abstract":"In order to efficiently recognize actions from depth sequences, we propose a novel feature, called Global Ternary Image (GTI), which implicitly encodes both motion regions and motion directions between consecutive depth frames via recording the changes of depth pixels. In this study, each pixel in GTI indicates one of the three possible states, namely positive, negative and neutral, which represents increased, decreased and same depth values, respectively. Since GTI is sensitive to the subject's speed, we obtain energy-based GTI (E-GTI) by extracting GTI from pairwise depth frames with equal motion energy. To involve temporal information among depth frames, we extract E-GTI using multiple settings of motion energy. Here, the noise can be effectively suppressed by describing E-GTIs using the Radon Transform (RT). The 3D action representation is formed as a result of feeding the hierarchical combination of RTs to the Bag of Visual Words model (BoVW). From the extensive experiments on four benchmark datasets, namely MSRAction3D, DHA, MSRGesture3D and SKIG, it is evident that the hierarchical E-GTI outperforms the existing methods in 3D action recognition. We tested our proposed approach on extended MSRAction3D dataset to further investigate and verify its robustness against partial occlusions, noise and speed.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Fourth International Conference on 3D Vision (3DV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/3DV.2016.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

In order to efficiently recognize actions from depth sequences, we propose a novel feature, called Global Ternary Image (GTI), which implicitly encodes both motion regions and motion directions between consecutive depth frames via recording the changes of depth pixels. In this study, each pixel in GTI indicates one of the three possible states, namely positive, negative and neutral, which represents increased, decreased and same depth values, respectively. Since GTI is sensitive to the subject's speed, we obtain energy-based GTI (E-GTI) by extracting GTI from pairwise depth frames with equal motion energy. To involve temporal information among depth frames, we extract E-GTI using multiple settings of motion energy. Here, the noise can be effectively suppressed by describing E-GTIs using the Radon Transform (RT). The 3D action representation is formed as a result of feeding the hierarchical combination of RTs to the Bag of Visual Words model (BoVW). From the extensive experiments on four benchmark datasets, namely MSRAction3D, DHA, MSRGesture3D and SKIG, it is evident that the hierarchical E-GTI outperforms the existing methods in 3D action recognition. We tested our proposed approach on extended MSRAction3D dataset to further investigate and verify its robustness against partial occlusions, noise and speed.

查看原文本刊更多论文

基于能量的全局三元图像单一深度序列动作识别

为了有效地识别深度序列中的动作，我们提出了一种新的特征，称为全局三元图像(Global Ternary Image, GTI)，它通过记录深度像素的变化来隐式地编码连续深度帧之间的运动区域和运动方向。在本研究中，GTI中的每个像素都代表三种可能状态中的一种，即正、负和中性，分别代表增加、减少和相同深度值。由于GTI对被测对象的速度非常敏感，我们从运动能量相等的两两深度帧中提取GTI，得到基于能量的GTI (E-GTI)。为了在深度帧之间包含时间信息，我们使用多种运动能量设置提取E-GTI。在这里，使用Radon变换(RT)描述e - gti可以有效地抑制噪声。3D动作表示是将RTs的分层组合提供给视觉词袋模型(BoVW)的结果。在MSRAction3D、DHA、MSRGesture3D和SKIG四个基准数据集上进行的大量实验表明，分层E-GTI在三维动作识别方面优于现有方法。我们在扩展的MSRAction3D数据集上测试了我们提出的方法，以进一步研究和验证其对部分遮挡、噪声和速度的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 Fourth International Conference on 3D Vision (3DV)

自引率

0.00%

发文量