{"title":"A Separable Spatial–Temporal Graph Learning Approach for Skeleton-Based Action Recognition","authors":"Hui Zheng;Ye-Sheng Zhao;Bo Zhang;Guo-Qiang Shang","doi":"10.1109/LSENS.2024.3475515","DOIUrl":null,"url":null,"abstract":"With the popularization of sensors and the development of pose estimation algorithms, a skeleton-based action recognition task has gradually become mainstream in human action recognition tasks. The key to solving skeleton-based action recognition task is to extract feature representations that can accurately outline the characteristics of human actions from sensor data. In this letter, we propose a separable spatial-temporal graph learning approach, which is composed of independent spatial and temporal graph networks. In the spatial graph network, spectral-based graph convolutional network is selected to mine spatial features of each moment. In the temporal graph network, a global-local attention mechanism is embedded to excavate interdependence at different times. Extensive experiments are carried out on the NTU-RGB+D and NTU-RGB+D 120 datasets, and the results show that our proposed method outperforms several other baselines.","PeriodicalId":13014,"journal":{"name":"IEEE Sensors Letters","volume":"8 11","pages":"1-4"},"PeriodicalIF":2.2000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10706715/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
With the popularization of sensors and the development of pose estimation algorithms, a skeleton-based action recognition task has gradually become mainstream in human action recognition tasks. The key to solving skeleton-based action recognition task is to extract feature representations that can accurately outline the characteristics of human actions from sensor data. In this letter, we propose a separable spatial-temporal graph learning approach, which is composed of independent spatial and temporal graph networks. In the spatial graph network, spectral-based graph convolutional network is selected to mine spatial features of each moment. In the temporal graph network, a global-local attention mechanism is embedded to excavate interdependence at different times. Extensive experiments are carried out on the NTU-RGB+D and NTU-RGB+D 120 datasets, and the results show that our proposed method outperforms several other baselines.