Ruyi Liu , Yi Liu , Mengyao Wu , Wentian Xin , Qiguang Miao , Xiangzeng Liu , Long Li
{"title":"SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition","authors":"Ruyi Liu , Yi Liu , Mengyao Wu , Wentian Xin , Qiguang Miao , Xiangzeng Liu , Long Li","doi":"10.1016/j.patcog.2025.111377","DOIUrl":null,"url":null,"abstract":"<div><div>Contrastive learning and multimodal representation learning have been widely applied to skeleton-based action recognition. However, the majority of the research focuses on the mining of spatial- temporal features while ignoring the semantic information of action. To deal with these drawbacks, we propose a novel contrastive learning framework (SG-CLR) for skeleton-based action recognition, which captures fine-grained multi-level discriminative features by incorporating both semantic compensation and spatial–temporal feature reinforcement. For semantic compensation contrastive learning, in order to achieve dynamic compensation of high-order semantic information, combining LLMs-generated action descriptions with multi-modal encoders to integrate cross-modal multivariate features (<em>e.g.,</em> skeleton and text features). For spatial–temporal enhancement contrastive learning, SkeleMask augmentation is proposed to mine more high-level temporal movement information. Experiments demonstrate that the proposed SG-CLR achieves the state-of-the-art performance on the NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. Related code will be available at <span><span>https://github.com/QingZhiWMY/SG-CLR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111377"},"PeriodicalIF":7.5000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325000378","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Contrastive learning and multimodal representation learning have been widely applied to skeleton-based action recognition. However, the majority of the research focuses on the mining of spatial- temporal features while ignoring the semantic information of action. To deal with these drawbacks, we propose a novel contrastive learning framework (SG-CLR) for skeleton-based action recognition, which captures fine-grained multi-level discriminative features by incorporating both semantic compensation and spatial–temporal feature reinforcement. For semantic compensation contrastive learning, in order to achieve dynamic compensation of high-order semantic information, combining LLMs-generated action descriptions with multi-modal encoders to integrate cross-modal multivariate features (e.g., skeleton and text features). For spatial–temporal enhancement contrastive learning, SkeleMask augmentation is proposed to mine more high-level temporal movement information. Experiments demonstrate that the proposed SG-CLR achieves the state-of-the-art performance on the NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. Related code will be available at https://github.com/QingZhiWMY/SG-CLR.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.