Daiheng Gao, Bang Zhang, Qi Wang, Xindi Zhang, Pan Pan, Yinghui Xu
{"title":"SCAT: Stride Consistency with Auto-regressive regressor and Transformer for hand pose estimation","authors":"Daiheng Gao, Bang Zhang, Qi Wang, Xindi Zhang, Pan Pan, Yinghui Xu","doi":"10.1109/ICCVW54120.2021.00256","DOIUrl":null,"url":null,"abstract":"The current state-of-the-art monocular 3D hand pose estimation methods are mostly model-based. For instance, MANO is one of the most popular hand parametric models, which can depict hand shapes and poses. It is widely adopted for estimating hand poses in images and videos. However, MANO is a parametric model derived from scanned hand data with limited shapes and poses which constrains its capability in depicting in-the-wild shape and pose variations. In this paper, we propose a 3D hand pose estimation approach which does not depends on any parametric hand models yet can still accurately estimate in-the-wild hand poses. Our approach (Stride Consistency with Autoregressive regressor and Transformer, SCAT) offers a new representation for measuring hand poses. The new representation includes a mean shape hand template and its 21 hand joint offsets depicting the 3D distances between the hand template and the hand that needs to be estimated. Besides, SCAT can generate a robust and smooth linear mapping between visual feature maps and the target 3D off-sets, ensuring inter-frame smoothness and removing motion jittering. We also introduce an auto-regressive refinement procedure for iteratively refining the hand pose estimation. Extensive experiments show that our SCAT can generate more accurate and smoother 3D hand pose estimation results compared with the state-of-the-art methods.","PeriodicalId":226794,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCVW54120.2021.00256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The current state-of-the-art monocular 3D hand pose estimation methods are mostly model-based. For instance, MANO is one of the most popular hand parametric models, which can depict hand shapes and poses. It is widely adopted for estimating hand poses in images and videos. However, MANO is a parametric model derived from scanned hand data with limited shapes and poses which constrains its capability in depicting in-the-wild shape and pose variations. In this paper, we propose a 3D hand pose estimation approach which does not depends on any parametric hand models yet can still accurately estimate in-the-wild hand poses. Our approach (Stride Consistency with Autoregressive regressor and Transformer, SCAT) offers a new representation for measuring hand poses. The new representation includes a mean shape hand template and its 21 hand joint offsets depicting the 3D distances between the hand template and the hand that needs to be estimated. Besides, SCAT can generate a robust and smooth linear mapping between visual feature maps and the target 3D off-sets, ensuring inter-frame smoothness and removing motion jittering. We also introduce an auto-regressive refinement procedure for iteratively refining the hand pose estimation. Extensive experiments show that our SCAT can generate more accurate and smoother 3D hand pose estimation results compared with the state-of-the-art methods.
目前最先进的单目三维手姿估计方法大多是基于模型的。例如,MANO是最流行的手参数化模型之一,它可以描述手的形状和姿势。它被广泛用于估计图像和视频中的手部姿势。然而,MANO是一种基于手部扫描数据的参数化模型,具有有限的形状和姿态,这限制了其描述野外形状和姿态变化的能力。在本文中,我们提出了一种不依赖于任何参数手模型的三维手姿估计方法,该方法仍然可以准确地估计野外手姿。我们的方法(Stride Consistency with Autoregressive regressor and Transformer, SCAT)提供了一种新的手部姿势测量方法。新的表示包括一个平均形状的手模板和它的21个手关节偏移量,这些偏移量描述了手模板和需要估计的手之间的3D距离。此外,SCAT可以在视觉特征映射和目标三维偏移之间生成鲁棒平滑的线性映射,保证帧间平滑,消除运动抖动。我们还引入了一种自回归的改进方法来迭代地改进手部姿态估计。大量的实验表明,与目前的方法相比,我们的SCAT可以产生更准确、更平滑的3D手部姿态估计结果。