SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-01-18 DOI:10.1016/j.patcog.2025.111377

Ruyi Liu , Yi Liu , Mengyao Wu , Wentian Xin , Qiguang Miao , Xiangzeng Liu , Long Li

{"title":"SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition","authors":"Ruyi Liu , Yi Liu , Mengyao Wu , Wentian Xin , Qiguang Miao , Xiangzeng Liu , Long Li","doi":"10.1016/j.patcog.2025.111377","DOIUrl":null,"url":null,"abstract":"<div><div>Contrastive learning and multimodal representation learning have been widely applied to skeleton-based action recognition. However, the majority of the research focuses on the mining of spatial- temporal features while ignoring the semantic information of action. To deal with these drawbacks, we propose a novel contrastive learning framework (SG-CLR) for skeleton-based action recognition, which captures fine-grained multi-level discriminative features by incorporating both semantic compensation and spatial–temporal feature reinforcement. For semantic compensation contrastive learning, in order to achieve dynamic compensation of high-order semantic information, combining LLMs-generated action descriptions with multi-modal encoders to integrate cross-modal multivariate features (<em>e.g.,</em> skeleton and text features). For spatial–temporal enhancement contrastive learning, SkeleMask augmentation is proposed to mine more high-level temporal movement information. Experiments demonstrate that the proposed SG-CLR achieves the state-of-the-art performance on the NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. Related code will be available at <span><span>https://github.com/QingZhiWMY/SG-CLR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111377"},"PeriodicalIF":7.5000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325000378","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Contrastive learning and multimodal representation learning have been widely applied to skeleton-based action recognition. However, the majority of the research focuses on the mining of spatial- temporal features while ignoring the semantic information of action. To deal with these drawbacks, we propose a novel contrastive learning framework (SG-CLR) for skeleton-based action recognition, which captures fine-grained multi-level discriminative features by incorporating both semantic compensation and spatial–temporal feature reinforcement. For semantic compensation contrastive learning, in order to achieve dynamic compensation of high-order semantic information, combining LLMs-generated action descriptions with multi-modal encoders to integrate cross-modal multivariate features (e.g., skeleton and text features). For spatial–temporal enhancement contrastive learning, SkeleMask augmentation is proposed to mine more high-level temporal movement information. Experiments demonstrate that the proposed SG-CLR achieves the state-of-the-art performance on the NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. Related code will be available at https://github.com/QingZhiWMY/SG-CLR.

Abstract Image

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.