Musrea Abdo Ghaseb;Ahmed Elhayek;Fawaz Alsolami;Abdullah Marish Ali
{"title":"面向动作识别的分割时空骨架图-注意力","authors":"Musrea Abdo Ghaseb;Ahmed Elhayek;Fawaz Alsolami;Abdullah Marish Ali","doi":"10.1109/TMM.2025.3535284","DOIUrl":null,"url":null,"abstract":"Human motion recognition is extremely important for many practical applications in several disciplines, such as surveillance, medicine, sports, gait analysis, and computer graphics. Graph convolutional networks (GCNs) enhance the accuracy and performance of skeleton-based action recognition. However, this approach has difficulties in modeling long-term temporal dependencies. In Addition, the fixed topology of the skeleton graph is not sufficiently robust to extract features for skeleton motions. Although transformers that rely entirely on self-attention have demonstrated great success in modeling global correlations between inputs and outputs, they ignore the local correlations between joints. In this study, we propose a novel segmented spatiotemporal skeleton graph-attention network (S3GAAR) to effectively learn different human actions and concentrate on the most operative part of the human body for each action. The proposed S3GAAR models spatial-temporal features through spatiotemporal attention for each segment to capture short-term temporal dependencies. Owing to several human actions that focus on one or more body parts such as mutual actions, our novel method divides the human skeleton into three segments: superior, inferior, and extremity joints. Our proposed method is designed to extract the features of each segment individually because human actions focus on one or more segments. Moreover, our segmented spatiotemporal graph introduces additional edges between important distant joints in the same segment. The experimental results show that our novel method outperforms state-of-the-art methods up to 1.1% on two large-scale benchmark datasets, NTU-RGB+D 60 and NTU-RGB+D 120.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3437-3446"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"S3GAAR: Segmented Spatiotemporal Skeleton Graph-Attention for Action Recognition\",\"authors\":\"Musrea Abdo Ghaseb;Ahmed Elhayek;Fawaz Alsolami;Abdullah Marish Ali\",\"doi\":\"10.1109/TMM.2025.3535284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human motion recognition is extremely important for many practical applications in several disciplines, such as surveillance, medicine, sports, gait analysis, and computer graphics. Graph convolutional networks (GCNs) enhance the accuracy and performance of skeleton-based action recognition. However, this approach has difficulties in modeling long-term temporal dependencies. In Addition, the fixed topology of the skeleton graph is not sufficiently robust to extract features for skeleton motions. Although transformers that rely entirely on self-attention have demonstrated great success in modeling global correlations between inputs and outputs, they ignore the local correlations between joints. In this study, we propose a novel segmented spatiotemporal skeleton graph-attention network (S3GAAR) to effectively learn different human actions and concentrate on the most operative part of the human body for each action. The proposed S3GAAR models spatial-temporal features through spatiotemporal attention for each segment to capture short-term temporal dependencies. Owing to several human actions that focus on one or more body parts such as mutual actions, our novel method divides the human skeleton into three segments: superior, inferior, and extremity joints. Our proposed method is designed to extract the features of each segment individually because human actions focus on one or more segments. Moreover, our segmented spatiotemporal graph introduces additional edges between important distant joints in the same segment. The experimental results show that our novel method outperforms state-of-the-art methods up to 1.1% on two large-scale benchmark datasets, NTU-RGB+D 60 and NTU-RGB+D 120.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"3437-3446\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10855563/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855563/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
S3GAAR: Segmented Spatiotemporal Skeleton Graph-Attention for Action Recognition
Human motion recognition is extremely important for many practical applications in several disciplines, such as surveillance, medicine, sports, gait analysis, and computer graphics. Graph convolutional networks (GCNs) enhance the accuracy and performance of skeleton-based action recognition. However, this approach has difficulties in modeling long-term temporal dependencies. In Addition, the fixed topology of the skeleton graph is not sufficiently robust to extract features for skeleton motions. Although transformers that rely entirely on self-attention have demonstrated great success in modeling global correlations between inputs and outputs, they ignore the local correlations between joints. In this study, we propose a novel segmented spatiotemporal skeleton graph-attention network (S3GAAR) to effectively learn different human actions and concentrate on the most operative part of the human body for each action. The proposed S3GAAR models spatial-temporal features through spatiotemporal attention for each segment to capture short-term temporal dependencies. Owing to several human actions that focus on one or more body parts such as mutual actions, our novel method divides the human skeleton into three segments: superior, inferior, and extremity joints. Our proposed method is designed to extract the features of each segment individually because human actions focus on one or more segments. Moreover, our segmented spatiotemporal graph introduces additional edges between important distant joints in the same segment. The experimental results show that our novel method outperforms state-of-the-art methods up to 1.1% on two large-scale benchmark datasets, NTU-RGB+D 60 and NTU-RGB+D 120.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.