{"title":"Frequency-Aware Self-Supervised Group Activity Recognition with skeleton sequences","authors":"Guoquan Wang, Mengyuan Liu, Hong Liu, Jinyan Zhang, Peini Guo, Ruijia Fan, Siyu Chen","doi":"10.1016/j.patcog.2025.111710","DOIUrl":null,"url":null,"abstract":"<div><div>Self-supervised, skeleton-based techniques have recently demonstrated great potential for group activity recognition via contrastive learning. However, these methods have difficulty accommodating the dynamic and complex nature of spatio-temporal data, weakening the ability to conduct effective modeling and extract crucial features. To this end, we propose a novel <strong>F</strong>requency-<strong>A</strong>ware <strong>G</strong>roup <strong>A</strong>ctivity <strong>R</strong>ecognition (FAGAR) network, which offers a comprehensive solution by addressing three key subproblems. First, the challenge of extracting discriminative features is further exacerbated by pose estimation algorithms’ limitations under random spatio-temporal data augmentation. To mitigate this, a frequency domain passing augmentation method that emphasizes individual collaborative changes is introduced, effectively filtering out noise interference. Second, the fixed connections in traditional relation modeling networks fail to adapt to dynamic scene changes. To address this, we design an adaptive frequency domain compression network, which dynamically adjusts to scene variations. Third, the temporal modeling process often leads to a loss of focus on key features, reducing the model’s ability to assess individual contributions within a group. To resolve this, we propose an amplitude-aware loss function that guides the network in learning the relative importance of individuals, ensuring it maintains the correct learning direction. Our FAGAR achieves state-of-the-art performance on several datasets for self-supervised skeleton-based group activity recognition. Code is available at <span><span>https://github.com/WGQ109/FAGAR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111710"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032500370X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Self-supervised, skeleton-based techniques have recently demonstrated great potential for group activity recognition via contrastive learning. However, these methods have difficulty accommodating the dynamic and complex nature of spatio-temporal data, weakening the ability to conduct effective modeling and extract crucial features. To this end, we propose a novel Frequency-Aware Group Activity Recognition (FAGAR) network, which offers a comprehensive solution by addressing three key subproblems. First, the challenge of extracting discriminative features is further exacerbated by pose estimation algorithms’ limitations under random spatio-temporal data augmentation. To mitigate this, a frequency domain passing augmentation method that emphasizes individual collaborative changes is introduced, effectively filtering out noise interference. Second, the fixed connections in traditional relation modeling networks fail to adapt to dynamic scene changes. To address this, we design an adaptive frequency domain compression network, which dynamically adjusts to scene variations. Third, the temporal modeling process often leads to a loss of focus on key features, reducing the model’s ability to assess individual contributions within a group. To resolve this, we propose an amplitude-aware loss function that guides the network in learning the relative importance of individuals, ensuring it maintains the correct learning direction. Our FAGAR achieves state-of-the-art performance on several datasets for self-supervised skeleton-based group activity recognition. Code is available at https://github.com/WGQ109/FAGAR.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.