{"title":"LGNet: Local-And-Global Feature Adaptive Network for Single Image Two-Hand Reconstruction","authors":"Haowei Xue, Meili Wang","doi":"10.1002/cav.70021","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Accurate 3D interacting hand mesh reconstruction from RGB images is crucial for applications such as robotics, augmented reality (AR), and virtual reality (VR). Especially in the field of robotics, accurate interacting hand mesh reconstruction can significantly improve the accuracy and naturalness of human-robot interaction. This task requires an accurate understanding of complex interactions between two hands and ensuring reasonable alignment of the hand mesh with the image. Recent Transformer-based methods directly utilize the features of the two hands as input tokens, ignoring the correlation between local and global features of the interacting hands, leading to hand ambiguity, self-occlusion, and self-similarity problems. We propose LGNet, Local and Global Feature Adaptive Network, through separating the hand mesh reconstruction process into three stages: A joint stage for predicting hand joints; a mesh stage for predicting a rough hand mesh; and a refine stage for fine-tuning the mesh-image alignment using an offset mesh. LGNet enables high-quality fingertip-level mesh-image alignment, effectively models the spatial relationship between two hands, and supports real-time prediction. Comprehensive quantitative and qualitative evaluations on benchmark datasets reveal that LGNet surpasses existing methods in mesh accuracy and alignment accuracy, while also showcasing robust generalization performance in tests on in-the-wild images.</p>\n </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 4","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.70021","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate 3D interacting hand mesh reconstruction from RGB images is crucial for applications such as robotics, augmented reality (AR), and virtual reality (VR). Especially in the field of robotics, accurate interacting hand mesh reconstruction can significantly improve the accuracy and naturalness of human-robot interaction. This task requires an accurate understanding of complex interactions between two hands and ensuring reasonable alignment of the hand mesh with the image. Recent Transformer-based methods directly utilize the features of the two hands as input tokens, ignoring the correlation between local and global features of the interacting hands, leading to hand ambiguity, self-occlusion, and self-similarity problems. We propose LGNet, Local and Global Feature Adaptive Network, through separating the hand mesh reconstruction process into three stages: A joint stage for predicting hand joints; a mesh stage for predicting a rough hand mesh; and a refine stage for fine-tuning the mesh-image alignment using an offset mesh. LGNet enables high-quality fingertip-level mesh-image alignment, effectively models the spatial relationship between two hands, and supports real-time prediction. Comprehensive quantitative and qualitative evaluations on benchmark datasets reveal that LGNet surpasses existing methods in mesh accuracy and alignment accuracy, while also showcasing robust generalization performance in tests on in-the-wild images.
期刊介绍:
With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.