{"title":"MULTI-Stream Graph Convolutional Networks with Efficient spatial-temporal Attention for Skeleton-based Action Recognition","authors":"Yueting Hui, Wensheng Sun","doi":"10.1145/3556677.3556692","DOIUrl":null,"url":null,"abstract":"In skeleton-based action recognition, graph convolutional networks (GCN) based methods have achieved remarkable performance by building skeleton coordinates into spatial-temporal graphs and explored the relationship between body joints. ST-GCN [19] proposed by Yan et al is regarded as a heuristic method, which firstly introduced GCN to skeleton-based action recognition. However, it applied graph convolution on joints of each frame equally. Less contribution joints caused interference in generating intermediate feature maps. We designed a spatial-temporal attention module to capture significant feature in spatial and temporal dimension simultaneously. Moreover, we adopted inverted bottleneck temporal convolutional networks to decrease computational amount and learned more feature with residual construction. Besides useful message in joints, bones and their movement also contain learnable information for analyzing action categories. We input data to a multi-stream framework. Finally, we demonstrated the efficiency of our proposed MSEA-GCN on NTU RGB+D datasets.","PeriodicalId":118446,"journal":{"name":"International Conference on Deep Learning Technologies","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Deep Learning Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3556677.3556692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In skeleton-based action recognition, graph convolutional networks (GCN) based methods have achieved remarkable performance by building skeleton coordinates into spatial-temporal graphs and explored the relationship between body joints. ST-GCN [19] proposed by Yan et al is regarded as a heuristic method, which firstly introduced GCN to skeleton-based action recognition. However, it applied graph convolution on joints of each frame equally. Less contribution joints caused interference in generating intermediate feature maps. We designed a spatial-temporal attention module to capture significant feature in spatial and temporal dimension simultaneously. Moreover, we adopted inverted bottleneck temporal convolutional networks to decrease computational amount and learned more feature with residual construction. Besides useful message in joints, bones and their movement also contain learnable information for analyzing action categories. We input data to a multi-stream framework. Finally, we demonstrated the efficiency of our proposed MSEA-GCN on NTU RGB+D datasets.