PRG-Net：用于三维人体动作识别的点关系导向网络

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-03-17 DOI:10.1016/j.neucom.2025.130015

Yao Du , Zhenjie Hou , En Lin , Xing Li , Jiuzhen Liang , Xinwen Zhou

{"title":"PRG-Net：用于三维人体动作识别的点关系导向网络","authors":"Yao Du , Zhenjie Hou , En Lin , Xing Li , Jiuzhen Liang , Xinwen Zhou","doi":"10.1016/j.neucom.2025.130015","DOIUrl":null,"url":null,"abstract":"<div><div>Point clouds contain rich spatial information, providing important supplementary clues for human action recognition. Recent methods for action recognition based on point cloud sequences primarily rely on complex spatiotemporal local encoding. However, these methods often utilize max-pooling operations to select features when extracting local features, restricting feature updates to local neighborhoods and failing to fully exploit the relationships between regions. Moreover, cross-frame encoding can also lead to the loss of spatiotemporal information. In this study, we propose PRG-Net, a Point Relation Guided Network, to further improve the learning of spatiotemporal features in point clouds. First, we designed two core modules: the Spatial Feature Aggregation (SFA) and the Spatial Feature Descriptor (SFD) modules. The SFA module expands the spatial structure between regions using dynamic aggregation techniques, while the SFD module guides the region aggregation process by Attention-Weighted Descriptors. They enhance the modeling of human spatial structure by expanding the relationships between regions. Second, we introduce inter-frame motion encoding techniques that can obtain the final spatiotemporal representation of the human body through the aggregation of cross-frame vectors, without relying on complex spatiotemporal local encoding. We evaluate PRG-Net on publicly available human action recognition datasets, including NTU RGB+D 60, NTU RGB+D 120, UTD-MHAD, and MSR Action 3D. Experimental results demonstrate that our method outperforms state-of-the-art point-based 3D action recognition methods significantly. Furthermore, we conduct extended experiments on the SHREC 2017 dataset for gesture recognition, and the results show that our method maintains competitive performance on that dataset as well.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"635 ","pages":"Article 130015"},"PeriodicalIF":6.5000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PRG-Net: Point Relationship-Guided Network for 3D human action recognition\",\"authors\":\"Yao Du , Zhenjie Hou , En Lin , Xing Li , Jiuzhen Liang , Xinwen Zhou\",\"doi\":\"10.1016/j.neucom.2025.130015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Point clouds contain rich spatial information, providing important supplementary clues for human action recognition. Recent methods for action recognition based on point cloud sequences primarily rely on complex spatiotemporal local encoding. However, these methods often utilize max-pooling operations to select features when extracting local features, restricting feature updates to local neighborhoods and failing to fully exploit the relationships between regions. Moreover, cross-frame encoding can also lead to the loss of spatiotemporal information. In this study, we propose PRG-Net, a Point Relation Guided Network, to further improve the learning of spatiotemporal features in point clouds. First, we designed two core modules: the Spatial Feature Aggregation (SFA) and the Spatial Feature Descriptor (SFD) modules. The SFA module expands the spatial structure between regions using dynamic aggregation techniques, while the SFD module guides the region aggregation process by Attention-Weighted Descriptors. They enhance the modeling of human spatial structure by expanding the relationships between regions. Second, we introduce inter-frame motion encoding techniques that can obtain the final spatiotemporal representation of the human body through the aggregation of cross-frame vectors, without relying on complex spatiotemporal local encoding. We evaluate PRG-Net on publicly available human action recognition datasets, including NTU RGB+D 60, NTU RGB+D 120, UTD-MHAD, and MSR Action 3D. Experimental results demonstrate that our method outperforms state-of-the-art point-based 3D action recognition methods significantly. Furthermore, we conduct extended experiments on the SHREC 2017 dataset for gesture recognition, and the results show that our method maintains competitive performance on that dataset as well.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"635 \",\"pages\":\"Article 130015\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225006873\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225006873","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

点云包含着丰富的空间信息，为人类行为识别提供了重要的补充线索。目前基于点云序列的动作识别方法主要依赖于复杂的时空局部编码。然而，这些方法在提取局部特征时往往使用最大池化操作来选择特征，将特征更新限制在局部邻域，未能充分利用区域之间的关系。此外，跨帧编码也会导致时空信息的丢失。在本研究中，我们提出了PRG-Net，即点关系引导网络，以进一步提高点云时空特征的学习。首先，我们设计了两个核心模块：空间特征聚合（SFA）和空间特征描述符（SFD）模块。SFA模块使用动态聚合技术扩展区域间的空间结构，而SFD模块使用注意力加权描述符指导区域聚合过程。它们通过扩展区域之间的关系来增强人类空间结构的建模。其次，我们引入帧间运动编码技术，该技术可以通过跨帧矢量的聚合获得人体的最终时空表示，而不依赖于复杂的时空局部编码。我们在公开可用的人类动作识别数据集上评估了PRG-Net，包括NTU RGB+D 60， NTU RGB+D 120， ut - mhad和MSR action 3D。实验结果表明，该方法明显优于当前基于点的三维动作识别方法。此外，我们在SHREC 2017数据集上进行了扩展的手势识别实验，结果表明我们的方法在该数据集上也保持了具有竞争力的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PRG-Net: Point Relationship-Guided Network for 3D human action recognition

Point clouds contain rich spatial information, providing important supplementary clues for human action recognition. Recent methods for action recognition based on point cloud sequences primarily rely on complex spatiotemporal local encoding. However, these methods often utilize max-pooling operations to select features when extracting local features, restricting feature updates to local neighborhoods and failing to fully exploit the relationships between regions. Moreover, cross-frame encoding can also lead to the loss of spatiotemporal information. In this study, we propose PRG-Net, a Point Relation Guided Network, to further improve the learning of spatiotemporal features in point clouds. First, we designed two core modules: the Spatial Feature Aggregation (SFA) and the Spatial Feature Descriptor (SFD) modules. The SFA module expands the spatial structure between regions using dynamic aggregation techniques, while the SFD module guides the region aggregation process by Attention-Weighted Descriptors. They enhance the modeling of human spatial structure by expanding the relationships between regions. Second, we introduce inter-frame motion encoding techniques that can obtain the final spatiotemporal representation of the human body through the aggregation of cross-frame vectors, without relying on complex spatiotemporal local encoding. We evaluate PRG-Net on publicly available human action recognition datasets, including NTU RGB+D 60, NTU RGB+D 120, UTD-MHAD, and MSR Action 3D. Experimental results demonstrate that our method outperforms state-of-the-art point-based 3D action recognition methods significantly. Furthermore, we conduct extended experiments on the SHREC 2017 dataset for gesture recognition, and the results show that our method maintains competitive performance on that dataset as well.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.