Graph Convolutional Neural Networks-based 3D Hand Pose Estimation over Point Clouds

2021 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2021-07-18 DOI:10.1109/IJCNN52387.2021.9533565

John Alejandro Castro-Vargas, P. Martinez-Gonzalez, Sergiu Oprea, Alberto Garcia-Garcia, S. Orts-Escolano, J. Garcia-Rodriguez

{"title":"Graph Convolutional Neural Networks-based 3D Hand Pose Estimation over Point Clouds","authors":"John Alejandro Castro-Vargas, P. Martinez-Gonzalez, Sergiu Oprea, Alberto Garcia-Garcia, S. Orts-Escolano, J. Garcia-Rodriguez","doi":"10.1109/IJCNN52387.2021.9533565","DOIUrl":null,"url":null,"abstract":"In recent years we can find a multitude of approaches that aim to return the 3D pose of the hands. Most of them try to estimate the pose from RGB images or even include some geometrical information via depth maps. Furthermore, some proposals have shown promising results using point clouds as input data. However, the sparse nature of this type of data is often one of its drawbacks. To tackle this sparsity, different strategies have been brought to the table such as voxelizing or sorting the input data to impose a structure to the input domain. In this paper, we address this problem by means of a graph structure. This process implies that we should accommodate the point cloud onto a graph representation that connects its points. We connect each point to its neighborhood, a method that has been successfully used in similar proposals and whose clustering effect enables us to emulate an effect similar to kernels in image convolutions. The proposed architecture uses both graph and 2D convolutions. The first one aims to extract local features and build a feature map, from which the 2D convolutions will extract a second level of features used to estimate the pose. This proposal shows initial results to return a 3D pose of the hand from depth maps, which are projected on point clouds and redefined as graphs. Although the results diverge from other more established methods in the state of the art, it presents a proof of concept by which to address this problem without losing spatial information.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years we can find a multitude of approaches that aim to return the 3D pose of the hands. Most of them try to estimate the pose from RGB images or even include some geometrical information via depth maps. Furthermore, some proposals have shown promising results using point clouds as input data. However, the sparse nature of this type of data is often one of its drawbacks. To tackle this sparsity, different strategies have been brought to the table such as voxelizing or sorting the input data to impose a structure to the input domain. In this paper, we address this problem by means of a graph structure. This process implies that we should accommodate the point cloud onto a graph representation that connects its points. We connect each point to its neighborhood, a method that has been successfully used in similar proposals and whose clustering effect enables us to emulate an effect similar to kernels in image convolutions. The proposed architecture uses both graph and 2D convolutions. The first one aims to extract local features and build a feature map, from which the 2D convolutions will extract a second level of features used to estimate the pose. This proposal shows initial results to return a 3D pose of the hand from depth maps, which are projected on point clouds and redefined as graphs. Although the results diverge from other more established methods in the state of the art, it presents a proof of concept by which to address this problem without losing spatial information.

查看原文本刊更多论文

基于卷积神经网络的点云三维手部姿态估计

近年来，我们可以找到许多旨在恢复手部3D姿势的方法。它们中的大多数尝试从RGB图像中估计姿态，甚至通过深度图包含一些几何信息。此外，一些使用点云作为输入数据的建议已经显示出有希望的结果。然而，这种类型的数据的稀疏特性往往是它的缺点之一。为了解决这种稀疏性，不同的策略被引入到表中，例如体素化或对输入数据进行排序，以将结构强加给输入域。在本文中，我们利用图结构来解决这个问题。这个过程意味着我们应该将点云容纳到连接其点的图形表示中。我们将每个点与其邻域连接起来，这种方法已经成功地应用于类似的建议中，其聚类效果使我们能够模拟类似于图像卷积中的核的效果。所提出的架构同时使用图和二维卷积。第一个目标是提取局部特征并构建特征映射，2D卷积将从中提取用于估计姿态的第二级特征。该方案显示了从深度图返回手的3D姿态的初步结果，深度图投影在点云上并重新定义为图形。尽管结果与其他更成熟的方法不同，但它提出了一个概念证明，通过它可以在不丢失空间信息的情况下解决这个问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量