{"title":"Adaptive Temporal Segmentation for Action Recognition","authors":"Zhiyu Chen, Yangwei Gu, Chunhua Deng, Ziqi Zhu","doi":"10.1109/SPAC49953.2019.237869","DOIUrl":null,"url":null,"abstract":"Learning deep representations have been widely used in action recognition task. However, the features extracted by deep convolutional neural networks (CNNs) have many redundant information. This paper aims to discover the relevance between temporal features extracted by CNNs. Different fromTemporal Segment Networks (TSN) to randomly select video clips. Based on the matrix-based Rényi’s α-entropy, we estimate the mutual information between temporal domain features. Through our experiments, we propose an adaptive temporal segmentation scheme to represent the entire videos. We also combine the features of RGB and optical flow frames extracted by 3D ConvNets to verify the complementary information between them. We show that the proposed approach achieves 94.4 and 72.8 percent accuracy, in the UCF- 101 and HMDB-51 datasets.","PeriodicalId":410003,"journal":{"name":"2019 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPAC49953.2019.237869","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Learning deep representations have been widely used in action recognition task. However, the features extracted by deep convolutional neural networks (CNNs) have many redundant information. This paper aims to discover the relevance between temporal features extracted by CNNs. Different fromTemporal Segment Networks (TSN) to randomly select video clips. Based on the matrix-based Rényi’s α-entropy, we estimate the mutual information between temporal domain features. Through our experiments, we propose an adaptive temporal segmentation scheme to represent the entire videos. We also combine the features of RGB and optical flow frames extracted by 3D ConvNets to verify the complementary information between them. We show that the proposed approach achieves 94.4 and 72.8 percent accuracy, in the UCF- 101 and HMDB-51 datasets.