Yanjun Liu , Wanshu Fan , Cong Wang , Shixi Wen , Xin Yang , Qiang Zhang , Xiaopeng Wei , Dongsheng Zhou
{"title":"GTIGNet: Global Topology Interaction Graphormer Network for 3D hand pose estimation","authors":"Yanjun Liu , Wanshu Fan , Cong Wang , Shixi Wen , Xin Yang , Qiang Zhang , Xiaopeng Wei , Dongsheng Zhou","doi":"10.1016/j.neunet.2025.107221","DOIUrl":null,"url":null,"abstract":"<div><div>Estimating 3D hand poses from monocular RGB images presents a series of challenges, including complex hand structures, self-occlusions, and depth ambiguities. Existing methods often fall short of capturing the long-distance dependencies of skeletal and non-skeletal connections for hand joints. To address these limitations, we introduce the Global Topology Interaction Graphormer Network (GTIGNet), a novel deep learning architecture designed to improve 3D hand pose estimation. Our model incorporates a Context-Aware Attention Block (CAAB) within the 2D pose estimator to enhance the extraction of multi-scale features, yielding more accurate 2D joint heatmaps to support the task that followed. Additionally, we introduce a High-Order Graphormer that explicitly and implicitly models the topological structure of hand joints, thereby enhancing feature interaction. Ablation studies confirm the effectiveness of our approach, and experimental results on four challenging datasets, Rendered Hand Dataset (RHD), Stereo Hand Pose Benchmark (STB), First-Person Hand Action Benchmark (FPHA), and FreiHAND Dataset, indicate that GTIGNet achieves state-of-the-art performance in 3D hand pose estimation. Notably, our model achieves an impressive Mean Per Joint Position Error (MPJPE) of 9.98 mm on RHD, 6.12 mm on STB, 11.15 mm on FPHA and 10.97 mm on FreiHAND.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107221"},"PeriodicalIF":6.0000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025001005","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Estimating 3D hand poses from monocular RGB images presents a series of challenges, including complex hand structures, self-occlusions, and depth ambiguities. Existing methods often fall short of capturing the long-distance dependencies of skeletal and non-skeletal connections for hand joints. To address these limitations, we introduce the Global Topology Interaction Graphormer Network (GTIGNet), a novel deep learning architecture designed to improve 3D hand pose estimation. Our model incorporates a Context-Aware Attention Block (CAAB) within the 2D pose estimator to enhance the extraction of multi-scale features, yielding more accurate 2D joint heatmaps to support the task that followed. Additionally, we introduce a High-Order Graphormer that explicitly and implicitly models the topological structure of hand joints, thereby enhancing feature interaction. Ablation studies confirm the effectiveness of our approach, and experimental results on four challenging datasets, Rendered Hand Dataset (RHD), Stereo Hand Pose Benchmark (STB), First-Person Hand Action Benchmark (FPHA), and FreiHAND Dataset, indicate that GTIGNet achieves state-of-the-art performance in 3D hand pose estimation. Notably, our model achieves an impressive Mean Per Joint Position Error (MPJPE) of 9.98 mm on RHD, 6.12 mm on STB, 11.15 mm on FPHA and 10.97 mm on FreiHAND.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.