John Alejandro Castro-Vargas, P. Martinez-Gonzalez, Sergiu Oprea, Alberto Garcia-Garcia, S. Orts-Escolano, J. Garcia-Rodriguez
{"title":"Graph Convolutional Neural Networks-based 3D Hand Pose Estimation over Point Clouds","authors":"John Alejandro Castro-Vargas, P. Martinez-Gonzalez, Sergiu Oprea, Alberto Garcia-Garcia, S. Orts-Escolano, J. Garcia-Rodriguez","doi":"10.1109/IJCNN52387.2021.9533565","DOIUrl":null,"url":null,"abstract":"In recent years we can find a multitude of approaches that aim to return the 3D pose of the hands. Most of them try to estimate the pose from RGB images or even include some geometrical information via depth maps. Furthermore, some proposals have shown promising results using point clouds as input data. However, the sparse nature of this type of data is often one of its drawbacks. To tackle this sparsity, different strategies have been brought to the table such as voxelizing or sorting the input data to impose a structure to the input domain. In this paper, we address this problem by means of a graph structure. This process implies that we should accommodate the point cloud onto a graph representation that connects its points. We connect each point to its neighborhood, a method that has been successfully used in similar proposals and whose clustering effect enables us to emulate an effect similar to kernels in image convolutions. The proposed architecture uses both graph and 2D convolutions. The first one aims to extract local features and build a feature map, from which the 2D convolutions will extract a second level of features used to estimate the pose. This proposal shows initial results to return a 3D pose of the hand from depth maps, which are projected on point clouds and redefined as graphs. Although the results diverge from other more established methods in the state of the art, it presents a proof of concept by which to address this problem without losing spatial information.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years we can find a multitude of approaches that aim to return the 3D pose of the hands. Most of them try to estimate the pose from RGB images or even include some geometrical information via depth maps. Furthermore, some proposals have shown promising results using point clouds as input data. However, the sparse nature of this type of data is often one of its drawbacks. To tackle this sparsity, different strategies have been brought to the table such as voxelizing or sorting the input data to impose a structure to the input domain. In this paper, we address this problem by means of a graph structure. This process implies that we should accommodate the point cloud onto a graph representation that connects its points. We connect each point to its neighborhood, a method that has been successfully used in similar proposals and whose clustering effect enables us to emulate an effect similar to kernels in image convolutions. The proposed architecture uses both graph and 2D convolutions. The first one aims to extract local features and build a feature map, from which the 2D convolutions will extract a second level of features used to estimate the pose. This proposal shows initial results to return a 3D pose of the hand from depth maps, which are projected on point clouds and redefined as graphs. Although the results diverge from other more established methods in the state of the art, it presents a proof of concept by which to address this problem without losing spatial information.