PointDMIG：用于三维动作识别的动态运动信息图神经网络

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems Pub Date : 2024-06-27 DOI:10.1007/s00530-024-01395-9

Yao Du, Zhenjie Hou, Xing Li, Jiuzhen Liang, Kaijun You, Xinwen Zhou

{"title":"PointDMIG：用于三维动作识别的动态运动信息图神经网络","authors":"Yao Du, Zhenjie Hou, Xing Li, Jiuzhen Liang, Kaijun You, Xinwen Zhou","doi":"10.1007/s00530-024-01395-9","DOIUrl":null,"url":null,"abstract":"<p>Point cloud contains rich spatial information, providing effective supplementary clues for action recognition. Existing action recognition algorithms based on point cloud sequences typically employ complex spatiotemporal local encoding to capture the spatiotemporal features, leading to the loss of spatial information and the inability to establish long-term spatial correlation. In this paper, we propose a PointDMIG network that models the long-term spatio-temporal correlation in point cloud sequences while retaining spatial structure information. Specifically, we first employ graph-based static point cloud techniques to construct topological structures for input point cloud sequences and encodes them as human static appearance feature vectors, introducing inherent frame-level parallelism to avoid the loss of spatial information. Then, we extend the technique for static point clouds by integrating the motion information of points between adjacent frames into the topological graph structure, capturing the long-term spatio-temporal evolution of human static appearance while preserving its spatial structure. Moreover, in order to enhance the semantic representation of the point cloud sequences, PointDMIG reconstructs the downsampled point set in the feature extraction process, further enriching the spatio-temporal information of human body movements. Experimental results on NTU RGB+D 60 and MSR Action 3D show that PointDMIG significantly improves the accuracy of 3D human action recognition based on point cloud sequences. We also performed an extended experiment on gesture recognition on the SHREC 2017 dataset, and PointDMIG achieved competitive results.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"177 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PointDMIG: a dynamic motion-informed graph neural network for 3D action recognition\",\"authors\":\"Yao Du, Zhenjie Hou, Xing Li, Jiuzhen Liang, Kaijun You, Xinwen Zhou\",\"doi\":\"10.1007/s00530-024-01395-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Point cloud contains rich spatial information, providing effective supplementary clues for action recognition. Existing action recognition algorithms based on point cloud sequences typically employ complex spatiotemporal local encoding to capture the spatiotemporal features, leading to the loss of spatial information and the inability to establish long-term spatial correlation. In this paper, we propose a PointDMIG network that models the long-term spatio-temporal correlation in point cloud sequences while retaining spatial structure information. Specifically, we first employ graph-based static point cloud techniques to construct topological structures for input point cloud sequences and encodes them as human static appearance feature vectors, introducing inherent frame-level parallelism to avoid the loss of spatial information. Then, we extend the technique for static point clouds by integrating the motion information of points between adjacent frames into the topological graph structure, capturing the long-term spatio-temporal evolution of human static appearance while preserving its spatial structure. Moreover, in order to enhance the semantic representation of the point cloud sequences, PointDMIG reconstructs the downsampled point set in the feature extraction process, further enriching the spatio-temporal information of human body movements. Experimental results on NTU RGB+D 60 and MSR Action 3D show that PointDMIG significantly improves the accuracy of 3D human action recognition based on point cloud sequences. We also performed an extended experiment on gesture recognition on the SHREC 2017 dataset, and PointDMIG achieved competitive results.</p>\",\"PeriodicalId\":51138,\"journal\":{\"name\":\"Multimedia Systems\",\"volume\":\"177 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00530-024-01395-9\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01395-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

点云包含丰富的空间信息，可为动作识别提供有效的补充线索。现有的基于点云序列的动作识别算法通常采用复杂的时空局部编码来捕捉时空特征，导致空间信息丢失，无法建立长期的空间相关性。在本文中，我们提出了一种 PointDMIG 网络，在保留空间结构信息的同时，对点云序列中的长期时空相关性进行建模。具体来说，我们首先采用基于图的静态点云技术来构建输入点云序列的拓扑结构，并将其编码为人类静态外观特征向量，同时引入固有的帧级并行性，以避免空间信息的丢失。然后，我们将相邻帧之间点的运动信息整合到拓扑图结构中，从而扩展了静态点云技术，在捕捉人体静态外观的长期时空演变的同时保留其空间结构。此外，为了增强点云序列的语义表示，PointDMIG 在特征提取过程中重建了降采样点集，进一步丰富了人体运动的时空信息。在 NTU RGB+D 60 和 MSR Action 3D 上的实验结果表明，PointDMIG 显著提高了基于点云序列的 3D 人体动作识别的准确性。我们还在 SHREC 2017 数据集上进行了手势识别的扩展实验，PointDMIG 取得了具有竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

PointDMIG: a dynamic motion-informed graph neural network for 3D action recognition

查看原文本刊更多论文

PointDMIG: a dynamic motion-informed graph neural network for 3D action recognition

Point cloud contains rich spatial information, providing effective supplementary clues for action recognition. Existing action recognition algorithms based on point cloud sequences typically employ complex spatiotemporal local encoding to capture the spatiotemporal features, leading to the loss of spatial information and the inability to establish long-term spatial correlation. In this paper, we propose a PointDMIG network that models the long-term spatio-temporal correlation in point cloud sequences while retaining spatial structure information. Specifically, we first employ graph-based static point cloud techniques to construct topological structures for input point cloud sequences and encodes them as human static appearance feature vectors, introducing inherent frame-level parallelism to avoid the loss of spatial information. Then, we extend the technique for static point clouds by integrating the motion information of points between adjacent frames into the topological graph structure, capturing the long-term spatio-temporal evolution of human static appearance while preserving its spatial structure. Moreover, in order to enhance the semantic representation of the point cloud sequences, PointDMIG reconstructs the downsampled point set in the feature extraction process, further enriching the spatio-temporal information of human body movements. Experimental results on NTU RGB+D 60 and MSR Action 3D show that PointDMIG significantly improves the accuracy of 3D human action recognition based on point cloud sequences. We also performed an extended experiment on gesture recognition on the SHREC 2017 dataset, and PointDMIG achieved competitive results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.