{"title":"MH-Net: Multiheaded 3D Hand Pose Estimation Network With 3D Anchorsets and Improved Multiscale Vision Transformer","authors":"Tekie Tsegay Tewolde;Ali Asghar Manjotho;Zhendong Niu","doi":"10.1109/TIV.2024.3387344","DOIUrl":null,"url":null,"abstract":"Accurate 3D hand pose estimation is a challenging computer vision problem primarily because of self-occlusion and viewpoint variations. Existing methods address viewpoint variations by applying data-centric transformations, such as data alignments or generating multiple views, which are prone to data sensitivity, error propagation, and prohibitive computational requirements. We improve the estimation accuracy by mitigating the impact of self-occlusion and viewpoint variations from the network side and propose MH-Net, a novel multiheaded network for accurate 3D hand pose estimation from a depth image. MH-Net comprises three key components. First, a multiscale feature extraction backbone based on an improved multiscale vision transformer (MViTv2) is proposed to extract shift-invariant global features. Second, a 3D anchorset generator is proposed to generate three disjoint sets of 3D anchors that serve two purposes: formulating hand pose estimation as an anchor-to-joint offset estimation and defining three unique viewpoints from a single depth image. Third, three identical regression heads are proposed to regress 3D joint positions based on unique viewpoints defined by their respective anchorsets. Extensive ablation studies have been conducted to investigate the impact of anchorsets, regression heads, and feature extraction backbones. Experiments on three public datasets, ICVL, MSRA, and NYU, show significant improvements over the state-of-the-art.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"9 10","pages":"6660-6671"},"PeriodicalIF":14.0000,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Vehicles","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10496825/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate 3D hand pose estimation is a challenging computer vision problem primarily because of self-occlusion and viewpoint variations. Existing methods address viewpoint variations by applying data-centric transformations, such as data alignments or generating multiple views, which are prone to data sensitivity, error propagation, and prohibitive computational requirements. We improve the estimation accuracy by mitigating the impact of self-occlusion and viewpoint variations from the network side and propose MH-Net, a novel multiheaded network for accurate 3D hand pose estimation from a depth image. MH-Net comprises three key components. First, a multiscale feature extraction backbone based on an improved multiscale vision transformer (MViTv2) is proposed to extract shift-invariant global features. Second, a 3D anchorset generator is proposed to generate three disjoint sets of 3D anchors that serve two purposes: formulating hand pose estimation as an anchor-to-joint offset estimation and defining three unique viewpoints from a single depth image. Third, three identical regression heads are proposed to regress 3D joint positions based on unique viewpoints defined by their respective anchorsets. Extensive ablation studies have been conducted to investigate the impact of anchorsets, regression heads, and feature extraction backbones. Experiments on three public datasets, ICVL, MSRA, and NYU, show significant improvements over the state-of-the-art.
期刊介绍:
The IEEE Transactions on Intelligent Vehicles (T-IV) is a premier platform for publishing peer-reviewed articles that present innovative research concepts, application results, significant theoretical findings, and application case studies in the field of intelligent vehicles. With a particular emphasis on automated vehicles within roadway environments, T-IV aims to raise awareness of pressing research and application challenges.
Our focus is on providing critical information to the intelligent vehicle community, serving as a dissemination vehicle for IEEE ITS Society members and others interested in learning about the state-of-the-art developments and progress in research and applications related to intelligent vehicles. Join us in advancing knowledge and innovation in this dynamic field.