基于人体骨骼动作识别的隐空间改进掩码重建模型。

IF 2.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Frontiers in Neurorobotics Pub Date : 2025-02-12 eCollection Date: 2025-01-01 DOI:10.3389/fnbot.2025.1482281

Enqing Chen, Xueting Wang, Xin Guo, Ying Zhu, Dong Li

{"title":"基于人体骨骼动作识别的隐空间改进掩码重建模型。","authors":"Enqing Chen, Xueting Wang, Xin Guo, Ying Zhu, Dong Li","doi":"10.3389/fnbot.2025.1482281","DOIUrl":null,"url":null,"abstract":"Human skeleton-based action recognition is an important task in the field of computer vision. In recent years, masked autoencoder (MAE) has been used in various fields due to its powerful self-supervised learning ability and has achieved good results in masked data reconstruction tasks. However, in visual classification tasks such as action recognition, the limited ability of the encoder to learn features in the autoencoder structure results in poor classification performance. We propose to enhance the encoder's feature extraction ability in classification tasks by leveraging the latent space of variational autoencoder (VAE) and further replace it with the latent space of vector quantized variational autoencoder (VQVAE). The constructed models are called SkeletonMVAE and SkeletonMVQVAE, respectively. In SkeletonMVAE, we constrain the latent variables to represent features in the form of distributions. In SkeletonMVQVAE, we discretize the latent variables. These help the encoder learn deeper data structures and more discriminative and generalized feature representations. The experiment results on the NTU-60 and NTU-120 datasets demonstrate that our proposed method can effectively improve the classification accuracy of the encoder in classification tasks and its generalization ability in the case of few labeled data. SkeletonMVAE exhibits stronger classification ability, while SkeletonMVQVAE exhibits stronger generalization in situations with fewer labeled data.","PeriodicalId":12628,"journal":{"name":"Frontiers in Neurorobotics","volume":"19 ","pages":"1482281"},"PeriodicalIF":2.8000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11947723/pdf/","citationCount":"0","resultStr":"{\"title\":\"Latent space improved masked reconstruction model for human skeleton-based action recognition.\",\"authors\":\"Enqing Chen, Xueting Wang, Xin Guo, Ying Zhu, Dong Li\",\"doi\":\"10.3389/fnbot.2025.1482281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human skeleton-based action recognition is an important task in the field of computer vision. In recent years, masked autoencoder (MAE) has been used in various fields due to its powerful self-supervised learning ability and has achieved good results in masked data reconstruction tasks. However, in visual classification tasks such as action recognition, the limited ability of the encoder to learn features in the autoencoder structure results in poor classification performance. We propose to enhance the encoder's feature extraction ability in classification tasks by leveraging the latent space of variational autoencoder (VAE) and further replace it with the latent space of vector quantized variational autoencoder (VQVAE). The constructed models are called SkeletonMVAE and SkeletonMVQVAE, respectively. In SkeletonMVAE, we constrain the latent variables to represent features in the form of distributions. In SkeletonMVQVAE, we discretize the latent variables. These help the encoder learn deeper data structures and more discriminative and generalized feature representations. The experiment results on the NTU-60 and NTU-120 datasets demonstrate that our proposed method can effectively improve the classification accuracy of the encoder in classification tasks and its generalization ability in the case of few labeled data. SkeletonMVAE exhibits stronger classification ability, while SkeletonMVQVAE exhibits stronger generalization in situations with fewer labeled data.\",\"PeriodicalId\":12628,\"journal\":{\"name\":\"Frontiers in Neurorobotics\",\"volume\":\"19 \",\"pages\":\"1482281\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11947723/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Neurorobotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3389/fnbot.2025.1482281\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neurorobotics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3389/fnbot.2025.1482281","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于人体骨骼的动作识别是计算机视觉领域的一个重要课题。近年来，掩码自编码器（MAE）由于其强大的自监督学习能力被应用于各个领域，并在掩码数据重构任务中取得了良好的效果。然而，在动作识别等视觉分类任务中，编码器学习自编码器结构特征的能力有限，导致分类性能不佳。我们提出利用变分自编码器（VAE）的潜在空间来增强编码器在分类任务中的特征提取能力，并进一步用矢量量化变分自编码器（VQVAE）的潜在空间来代替。所构建的模型分别称为SkeletonMVAE和SkeletonMVQVAE。在SkeletonMVAE中，我们约束潜在变量以分布的形式表示特征。在SkeletonMVQVAE中，我们将潜在变量离散化。这有助于编码器学习更深入的数据结构和更具判别性和广义的特征表示。在NTU-60和NTU-120数据集上的实验结果表明，我们提出的方法可以有效地提高编码器在分类任务中的分类精度和在标记数据较少的情况下的泛化能力。在标记数据较少的情况下，SkeletonMVQVAE表现出更强的分类能力，而SkeletonMVQVAE表现出更强的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Latent space improved masked reconstruction model for human skeleton-based action recognition.

查看原文本刊更多论文

Latent space improved masked reconstruction model for human skeleton-based action recognition.

Human skeleton-based action recognition is an important task in the field of computer vision. In recent years, masked autoencoder (MAE) has been used in various fields due to its powerful self-supervised learning ability and has achieved good results in masked data reconstruction tasks. However, in visual classification tasks such as action recognition, the limited ability of the encoder to learn features in the autoencoder structure results in poor classification performance. We propose to enhance the encoder's feature extraction ability in classification tasks by leveraging the latent space of variational autoencoder (VAE) and further replace it with the latent space of vector quantized variational autoencoder (VQVAE). The constructed models are called SkeletonMVAE and SkeletonMVQVAE, respectively. In SkeletonMVAE, we constrain the latent variables to represent features in the form of distributions. In SkeletonMVQVAE, we discretize the latent variables. These help the encoder learn deeper data structures and more discriminative and generalized feature representations. The experiment results on the NTU-60 and NTU-120 datasets demonstrate that our proposed method can effectively improve the classification accuracy of the encoder in classification tasks and its generalization ability in the case of few labeled data. SkeletonMVAE exhibits stronger classification ability, while SkeletonMVQVAE exhibits stronger generalization in situations with fewer labeled data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Neurorobotics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCER-ROBOTICS

CiteScore

5.20

自引率

6.50%

发文量

250

审稿时长

14 weeks

期刊介绍： Frontiers in Neurorobotics publishes rigorously peer-reviewed research in the science and technology of embodied autonomous neural systems. Specialty Chief Editors Alois C. Knoll and Florian Röhrbein at the Technische Universität München are supported by an outstanding Editorial Board of international experts. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics and the public worldwide. Neural systems include brain-inspired algorithms (e.g. connectionist networks), computational models of biological neural networks (e.g. artificial spiking neural nets, large-scale simulations of neural microcircuits) and actual biological systems (e.g. in vivo and in vitro neural nets). The focus of the journal is the embodiment of such neural systems in artificial software and hardware devices, machines, robots or any other form of physical actuation. This also includes prosthetic devices, brain machine interfaces, wearable systems, micro-machines, furniture, home appliances, as well as systems for managing micro and macro infrastructures. Frontiers in Neurorobotics also aims to publish radically new tools and methods to study plasticity and development of autonomous self-learning systems that are capable of acquiring knowledge in an open-ended manner. Models complemented with experimental studies revealing self-organizing principles of embodied neural systems are welcome. Our journal also publishes on the micro and macro engineering and mechatronics of robotic devices driven by neural systems, as well as studies on the impact that such systems will have on our daily life.