骨架行为识别的半监督学习：一种多维图比较方法

IF 5.2 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of King Saud University-Computer and Information Sciences Pub Date : 2024-12-01 DOI:10.1016/j.jksuci.2024.102266

Qiang Zhao , Moyan Zhang , Hongjuan Li , Baozhen Song , Yujun Li

{"title":"骨架行为识别的半监督学习：一种多维图比较方法","authors":"Qiang Zhao , Moyan Zhang , Hongjuan Li , Baozhen Song , Yujun Li","doi":"10.1016/j.jksuci.2024.102266","DOIUrl":null,"url":null,"abstract":"<div><div>Skeleton-based action recognition, as a crucial research direction in computer vision, confronts numerous issues and challenges. Most existing research methods rely heavily on extensive labeled data for training, which significantly constraints their training effectiveness and generalization capability when labeled data is scarce. Consequently, how to integrate labeled and unlabeled data to overcome the limitations imposed by label scarcity has emerged as a pivotal research focus in skeleton-based action recognition. Targeting this label scarcity problem, this paper introduces a semi-supervised skeleton-based action recognition approach leveraging multi-dimensional feature-based graph contrastive learning. Firstly, three feature extractors are devised to extract and exploit the available informative cues from limited data thoroughly. The holistic feature extractor comprises five spatio-temporal graph convolutional blocks and a global average pooling layer. The detailed feature extractor is constructed by stacking the same spatio-temporal graph convolutional blocks, while the relational feature extractor primarily integrates stacked attention graph convolutional blocks and a global average pooling layer. Secondly, the sample relationship construction mechanism in graph contrastive learning is enhanced. A clustering process is employed to formulate soft positive/negative sample pairs based on sample similarity, and a sample connectivity matrix further weights the distances between these pairs, thereby enhancing classification accuracy. Furthermore, a novel loss function grounded in the information bottleneck theory is formulated to guide the model towards learning more robust and efficient skeleton action representations. Experimental evaluations demonstrate the superiority of our proposed method (MDKS) on two datasets, NTU60 and NW-UCLA. Specifically, on the NTU60 dataset, MDKS achieves classification accuracy improvements of 4.7% and 1.9% under the X-sub and X-view evaluation protocols, respectively, compared to the benchmark MAC-Learning method. On the NW-UCLA dataset, MDKS outperforms MAC-Learning by 1.4%, 1.2%, 1.9%, and 1.4% in classification accuracy under different labeled data ratios ranging from 5% to 40%. This work offers novel insights and methodologies for advancing skeleton-based action recognition. Future research will delve into label imbalance, label noise, multi-modal information fusion, and cross-scene generalization capabilities.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 10","pages":"Article 102266"},"PeriodicalIF":5.2000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-supervised learning for skeleton behavior recognition: A multi-dimensional graph comparison approach\",\"authors\":\"Qiang Zhao , Moyan Zhang , Hongjuan Li , Baozhen Song , Yujun Li\",\"doi\":\"10.1016/j.jksuci.2024.102266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Skeleton-based action recognition, as a crucial research direction in computer vision, confronts numerous issues and challenges. Most existing research methods rely heavily on extensive labeled data for training, which significantly constraints their training effectiveness and generalization capability when labeled data is scarce. Consequently, how to integrate labeled and unlabeled data to overcome the limitations imposed by label scarcity has emerged as a pivotal research focus in skeleton-based action recognition. Targeting this label scarcity problem, this paper introduces a semi-supervised skeleton-based action recognition approach leveraging multi-dimensional feature-based graph contrastive learning. Firstly, three feature extractors are devised to extract and exploit the available informative cues from limited data thoroughly. The holistic feature extractor comprises five spatio-temporal graph convolutional blocks and a global average pooling layer. The detailed feature extractor is constructed by stacking the same spatio-temporal graph convolutional blocks, while the relational feature extractor primarily integrates stacked attention graph convolutional blocks and a global average pooling layer. Secondly, the sample relationship construction mechanism in graph contrastive learning is enhanced. A clustering process is employed to formulate soft positive/negative sample pairs based on sample similarity, and a sample connectivity matrix further weights the distances between these pairs, thereby enhancing classification accuracy. Furthermore, a novel loss function grounded in the information bottleneck theory is formulated to guide the model towards learning more robust and efficient skeleton action representations. Experimental evaluations demonstrate the superiority of our proposed method (MDKS) on two datasets, NTU60 and NW-UCLA. Specifically, on the NTU60 dataset, MDKS achieves classification accuracy improvements of 4.7% and 1.9% under the X-sub and X-view evaluation protocols, respectively, compared to the benchmark MAC-Learning method. On the NW-UCLA dataset, MDKS outperforms MAC-Learning by 1.4%, 1.2%, 1.9%, and 1.4% in classification accuracy under different labeled data ratios ranging from 5% to 40%. This work offers novel insights and methodologies for advancing skeleton-based action recognition. Future research will delve into label imbalance, label noise, multi-modal information fusion, and cross-scene generalization capabilities.</div></div>\",\"PeriodicalId\":48547,\"journal\":{\"name\":\"Journal of King Saud University-Computer and Information Sciences\",\"volume\":\"36 10\",\"pages\":\"Article 102266\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of King Saud University-Computer and Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1319157824003550\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of King Saud University-Computer and Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1319157824003550","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

基于骨骼的动作识别作为计算机视觉的一个重要研究方向，面临着诸多问题和挑战。现有的大多数研究方法都严重依赖于大量的标记数据进行训练，这在标记数据稀缺的情况下，极大地限制了它们的训练效率和泛化能力。因此，如何整合标记和未标记数据以克服标签稀缺性所带来的限制已成为基于骨架的动作识别的关键研究热点。针对标签稀缺性问题，本文引入了一种基于多维特征的图对比学习的半监督骨架动作识别方法。首先，设计了三个特征提取器，从有限的数据中充分提取和利用可用的信息线索。整体特征提取器包括5个时空图卷积块和一个全局平均池化层。详细特征提取器是通过叠加相同的时空图卷积块来构建的，而关系特征提取器主要是将叠加的注意图卷积块和全局平均池化层集成在一起。其次，增强了图对比学习中的样本关系构建机制。基于样本相似度，采用聚类过程形成软正/负样本对，并利用样本连通性矩阵进一步加权这些对之间的距离，从而提高分类精度。此外，基于信息瓶颈理论，提出了一种新的损失函数，以指导模型学习更鲁棒和高效的骨架动作表示。实验验证了我们提出的方法（MDKS）在NTU60和NW-UCLA两个数据集上的优越性。具体而言，在NTU60数据集上，MDKS在X-sub和X-view评估协议下的分类准确率分别比基准MAC-Learning方法提高了4.7%和1.9%。在NW-UCLA数据集上，MDKS在5% - 40%的不同标记数据比例下的分类准确率分别比MAC-Learning高1.4%、1.2%、1.9%和1.4%。这项工作为推进基于骨骼的动作识别提供了新的见解和方法。未来的研究将深入研究标签不平衡、标签噪声、多模态信息融合和跨场景泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semi-supervised learning for skeleton behavior recognition: A multi-dimensional graph comparison approach

Skeleton-based action recognition, as a crucial research direction in computer vision, confronts numerous issues and challenges. Most existing research methods rely heavily on extensive labeled data for training, which significantly constraints their training effectiveness and generalization capability when labeled data is scarce. Consequently, how to integrate labeled and unlabeled data to overcome the limitations imposed by label scarcity has emerged as a pivotal research focus in skeleton-based action recognition. Targeting this label scarcity problem, this paper introduces a semi-supervised skeleton-based action recognition approach leveraging multi-dimensional feature-based graph contrastive learning. Firstly, three feature extractors are devised to extract and exploit the available informative cues from limited data thoroughly. The holistic feature extractor comprises five spatio-temporal graph convolutional blocks and a global average pooling layer. The detailed feature extractor is constructed by stacking the same spatio-temporal graph convolutional blocks, while the relational feature extractor primarily integrates stacked attention graph convolutional blocks and a global average pooling layer. Secondly, the sample relationship construction mechanism in graph contrastive learning is enhanced. A clustering process is employed to formulate soft positive/negative sample pairs based on sample similarity, and a sample connectivity matrix further weights the distances between these pairs, thereby enhancing classification accuracy. Furthermore, a novel loss function grounded in the information bottleneck theory is formulated to guide the model towards learning more robust and efficient skeleton action representations. Experimental evaluations demonstrate the superiority of our proposed method (MDKS) on two datasets, NTU60 and NW-UCLA. Specifically, on the NTU60 dataset, MDKS achieves classification accuracy improvements of 4.7% and 1.9% under the X-sub and X-view evaluation protocols, respectively, compared to the benchmark MAC-Learning method. On the NW-UCLA dataset, MDKS outperforms MAC-Learning by 1.4%, 1.2%, 1.9%, and 1.4% in classification accuracy under different labeled data ratios ranging from 5% to 40%. This work offers novel insights and methodologies for advancing skeleton-based action recognition. Future research will delve into label imbalance, label noise, multi-modal information fusion, and cross-scene generalization capabilities.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of King Saud University-Computer and Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

10.50

自引率

8.70%

发文量

656

审稿时长

29 days

期刊介绍： In 2022 the Journal of King Saud University - Computer and Information Sciences will become an author paid open access journal. Authors who submit their manuscript after October 31st 2021 will be asked to pay an Article Processing Charge (APC) after acceptance of their paper to make their work immediately, permanently, and freely accessible to all. The Journal of King Saud University Computer and Information Sciences is a refereed, international journal that covers all aspects of both foundations of computer and its practical applications.