SAR: Spatial-Aware Regression for 3D Hand Pose and Mesh Reconstruction from a Monocular RGB Image

2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) Pub Date : 2021-10-01 DOI:10.1109/ismar52148.2021.00024

Xiaozheng Zheng, Pengfei Ren, Haifeng Sun, Jingyu Wang, Q. Qi, J. Liao

{"title":"SAR: Spatial-Aware Regression for 3D Hand Pose and Mesh Reconstruction from a Monocular RGB Image","authors":"Xiaozheng Zheng, Pengfei Ren, Haifeng Sun, Jingyu Wang, Q. Qi, J. Liao","doi":"10.1109/ismar52148.2021.00024","DOIUrl":null,"url":null,"abstract":"3D hand reconstruction is a popular research topic in recent years, which has great potential for VR/AR applications. However, due to the limited computational resource of VR/AR equipment, the reconstruction algorithm must balance accuracy and efficiency to make the users have a good experience. Nevertheless, current methods are not doing well in balancing accuracy and efficiency. Therefore, this paper proposes a novel framework that can achieve a fast and accurate 3D hand reconstruction. Our framework relies on three essential modules, including spatial-aware initial graph building (SAIGB), graph convolutional network (GCN) based belief maps regression (GBBMR), and pose-guided refinement (PGR). At first, given image feature maps extracted by convolutional neural networks, SAIGB builds a spatial-aware and compact initial feature graph. Each node in this graph represents a vertex of the mesh and has vertex-specific spatial information that is helpful for accurate and efficient regression. After that, GBBMR first utilizes adaptive-GCN to introduce interactions between vertices to capture short-range and long-range dependencies between vertices efficiently and flexibly. Then, it maps vertices’ features to belief maps that can model the uncertainty of predictions for more accurate predictions. Finally, we apply PGR to compress the redundant vertices’ belief maps to compact-joints’ belief maps with the pose guidance and use these joints’ belief maps to refine previous predictions better to obtain more accurate and robust reconstruction results. Our method achieves state-of-the-art performance on four public benchmarks, FreiHAND, HO-3D, RHD, and STB. Moreover, our method can run at a speed of two to three times that of previous state-of-the-art methods. Our code is available at https://github.com/zxz267/SAR.","PeriodicalId":395413,"journal":{"name":"2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ismar52148.2021.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

3D hand reconstruction is a popular research topic in recent years, which has great potential for VR/AR applications. However, due to the limited computational resource of VR/AR equipment, the reconstruction algorithm must balance accuracy and efficiency to make the users have a good experience. Nevertheless, current methods are not doing well in balancing accuracy and efficiency. Therefore, this paper proposes a novel framework that can achieve a fast and accurate 3D hand reconstruction. Our framework relies on three essential modules, including spatial-aware initial graph building (SAIGB), graph convolutional network (GCN) based belief maps regression (GBBMR), and pose-guided refinement (PGR). At first, given image feature maps extracted by convolutional neural networks, SAIGB builds a spatial-aware and compact initial feature graph. Each node in this graph represents a vertex of the mesh and has vertex-specific spatial information that is helpful for accurate and efficient regression. After that, GBBMR first utilizes adaptive-GCN to introduce interactions between vertices to capture short-range and long-range dependencies between vertices efficiently and flexibly. Then, it maps vertices’ features to belief maps that can model the uncertainty of predictions for more accurate predictions. Finally, we apply PGR to compress the redundant vertices’ belief maps to compact-joints’ belief maps with the pose guidance and use these joints’ belief maps to refine previous predictions better to obtain more accurate and robust reconstruction results. Our method achieves state-of-the-art performance on four public benchmarks, FreiHAND, HO-3D, RHD, and STB. Moreover, our method can run at a speed of two to three times that of previous state-of-the-art methods. Our code is available at https://github.com/zxz267/SAR.

查看原文本刊更多论文

SAR:基于单目RGB图像的三维手部姿态和网格重建的空间感知回归

三维手部重建是近年来的热门研究课题，在VR/AR应用中具有巨大的潜力。然而，由于VR/AR设备的计算资源有限，重构算法必须平衡精度和效率，才能让用户获得良好的体验。然而，目前的方法在平衡准确性和效率方面做得并不好。因此，本文提出了一种新的框架，可以实现快速准确的三维手部重建。我们的框架依赖于三个基本模块，包括空间感知初始图构建(SAIGB)，基于图卷积网络(GCN)的信念映射回归(GBBMR)和姿态引导细化(PGR)。首先，给定卷积神经网络提取的图像特征图，SAIGB构建一个空间感知的、紧凑的初始特征图。该图中的每个节点代表网格的一个顶点，并具有特定于顶点的空间信息，有助于准确和高效的回归。之后，GBBMR首先利用自适应gcn引入顶点之间的相互作用，高效灵活地捕获顶点之间的短距离和长期依赖关系。然后，它将顶点的特征映射到可以模拟预测不确定性的信念图，以获得更准确的预测。最后，我们利用PGR将冗余顶点的信念映射压缩为具有位姿引导的紧关节的信念映射，并利用这些关节的信念映射对先前的预测进行更好的细化，以获得更准确和鲁棒的重建结果。我们的方法在FreiHAND、HO-3D、RHD和STB这四个公共基准上实现了最先进的性能。此外，我们的方法的运行速度是以前最先进方法的两到三倍。我们的代码可在https://github.com/zxz267/SAR上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)

自引率

0.00%

发文量