{"title":"基于Actionlet对比与重构的自监督骨架表示学习。","authors":"Lilang Lin;Jiahang Zhang;Jiaying Liu","doi":"10.1109/TPAMI.2025.3598138","DOIUrl":null,"url":null,"abstract":"Contrastive learning has shown remarkable success in the domain of skeleton-based action recognition. However, the design of data transformations, which is crucial for effective contrastive learning, remains a challenging aspect in the context of skeleton-based action recognition. The difficulty lies in creating data transformations that capture rich motion patterns while ensuring that the transformed data retains the same semantic information. To tackle this challenge, we introduce an innovative framework called ActCLR+ (Actionlet-Dependent Contrastive Learning), which explicitly distinguishes between static and dynamic regions in a skeleton sequence. We begin by introducing the concept of <italic>actionlet</i>, connecting self-supervised learning quantitatively with downstream tasks. Actionlets represent regions in the skeleton where features closely align with action prototypes, highlighting dynamic sequences as distinct from static ones. We propose an anchor-based method for unsupervised actionlet discovery, establishing a motion-adaptive data transformation approach based on this discovery. This motion-adaptive data transformation strategy tailors data transformations for actionlet and non-actionlet regions, respectively, introducing more diverse motion patterns while preserving the original motion semantics. Additionally, we incorporate a semantic-aware masked motion modeling technique to enhance the learning of actionlet representations. Our comprehensive experiments on well-established benchmark datasets such as NTU RGB+D and PKUMMD validate the effectiveness of our proposed method.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10818-10835"},"PeriodicalIF":18.6000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Self-Supervised Skeleton Representation Learning Via Actionlet Contrast and Reconstruct\",\"authors\":\"Lilang Lin;Jiahang Zhang;Jiaying Liu\",\"doi\":\"10.1109/TPAMI.2025.3598138\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contrastive learning has shown remarkable success in the domain of skeleton-based action recognition. However, the design of data transformations, which is crucial for effective contrastive learning, remains a challenging aspect in the context of skeleton-based action recognition. The difficulty lies in creating data transformations that capture rich motion patterns while ensuring that the transformed data retains the same semantic information. To tackle this challenge, we introduce an innovative framework called ActCLR+ (Actionlet-Dependent Contrastive Learning), which explicitly distinguishes between static and dynamic regions in a skeleton sequence. We begin by introducing the concept of <italic>actionlet</i>, connecting self-supervised learning quantitatively with downstream tasks. Actionlets represent regions in the skeleton where features closely align with action prototypes, highlighting dynamic sequences as distinct from static ones. We propose an anchor-based method for unsupervised actionlet discovery, establishing a motion-adaptive data transformation approach based on this discovery. This motion-adaptive data transformation strategy tailors data transformations for actionlet and non-actionlet regions, respectively, introducing more diverse motion patterns while preserving the original motion semantics. Additionally, we incorporate a semantic-aware masked motion modeling technique to enhance the learning of actionlet representations. Our comprehensive experiments on well-established benchmark datasets such as NTU RGB+D and PKUMMD validate the effectiveness of our proposed method.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 11\",\"pages\":\"10818-10835\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11123705/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11123705/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Self-Supervised Skeleton Representation Learning Via Actionlet Contrast and Reconstruct
Contrastive learning has shown remarkable success in the domain of skeleton-based action recognition. However, the design of data transformations, which is crucial for effective contrastive learning, remains a challenging aspect in the context of skeleton-based action recognition. The difficulty lies in creating data transformations that capture rich motion patterns while ensuring that the transformed data retains the same semantic information. To tackle this challenge, we introduce an innovative framework called ActCLR+ (Actionlet-Dependent Contrastive Learning), which explicitly distinguishes between static and dynamic regions in a skeleton sequence. We begin by introducing the concept of actionlet, connecting self-supervised learning quantitatively with downstream tasks. Actionlets represent regions in the skeleton where features closely align with action prototypes, highlighting dynamic sequences as distinct from static ones. We propose an anchor-based method for unsupervised actionlet discovery, establishing a motion-adaptive data transformation approach based on this discovery. This motion-adaptive data transformation strategy tailors data transformations for actionlet and non-actionlet regions, respectively, introducing more diverse motion patterns while preserving the original motion semantics. Additionally, we incorporate a semantic-aware masked motion modeling technique to enhance the learning of actionlet representations. Our comprehensive experiments on well-established benchmark datasets such as NTU RGB+D and PKUMMD validate the effectiveness of our proposed method.