LGNet: Local-And-Global Feature Adaptive Network for Single Image Two-Hand Reconstruction

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds Pub Date : 2025-07-09 DOI:10.1002/cav.70021

Haowei Xue, Meili Wang

{"title":"LGNet: Local-And-Global Feature Adaptive Network for Single Image Two-Hand Reconstruction","authors":"Haowei Xue, Meili Wang","doi":"10.1002/cav.70021","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Accurate 3D interacting hand mesh reconstruction from RGB images is crucial for applications such as robotics, augmented reality (AR), and virtual reality (VR). Especially in the field of robotics, accurate interacting hand mesh reconstruction can significantly improve the accuracy and naturalness of human-robot interaction. This task requires an accurate understanding of complex interactions between two hands and ensuring reasonable alignment of the hand mesh with the image. Recent Transformer-based methods directly utilize the features of the two hands as input tokens, ignoring the correlation between local and global features of the interacting hands, leading to hand ambiguity, self-occlusion, and self-similarity problems. We propose LGNet, Local and Global Feature Adaptive Network, through separating the hand mesh reconstruction process into three stages: A joint stage for predicting hand joints; a mesh stage for predicting a rough hand mesh; and a refine stage for fine-tuning the mesh-image alignment using an offset mesh. LGNet enables high-quality fingertip-level mesh-image alignment, effectively models the spatial relationship between two hands, and supports real-time prediction. Comprehensive quantitative and qualitative evaluations on benchmark datasets reveal that LGNet surpasses existing methods in mesh accuracy and alignment accuracy, while also showcasing robust generalization performance in tests on in-the-wild images.</p>\n </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 4","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.70021","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate 3D interacting hand mesh reconstruction from RGB images is crucial for applications such as robotics, augmented reality (AR), and virtual reality (VR). Especially in the field of robotics, accurate interacting hand mesh reconstruction can significantly improve the accuracy and naturalness of human-robot interaction. This task requires an accurate understanding of complex interactions between two hands and ensuring reasonable alignment of the hand mesh with the image. Recent Transformer-based methods directly utilize the features of the two hands as input tokens, ignoring the correlation between local and global features of the interacting hands, leading to hand ambiguity, self-occlusion, and self-similarity problems. We propose LGNet, Local and Global Feature Adaptive Network, through separating the hand mesh reconstruction process into three stages: A joint stage for predicting hand joints; a mesh stage for predicting a rough hand mesh; and a refine stage for fine-tuning the mesh-image alignment using an offset mesh. LGNet enables high-quality fingertip-level mesh-image alignment, effectively models the spatial relationship between two hands, and supports real-time prediction. Comprehensive quantitative and qualitative evaluations on benchmark datasets reveal that LGNet surpasses existing methods in mesh accuracy and alignment accuracy, while also showcasing robust generalization performance in tests on in-the-wild images.

查看原文本刊更多论文

LGNet：单幅图像双手重建的局部-全局特征自适应网络

从RGB图像中精确的3D交互手网格重建对于机器人、增强现实（AR）和虚拟现实（VR）等应用至关重要。特别是在机器人领域，精确的交互手网格重建可以显著提高人机交互的精度和自然度。这项任务需要准确理解双手之间复杂的相互作用，并确保手网格与图像的合理对齐。最近基于transformer的方法直接利用两只手的特征作为输入标记，忽略了相互作用的手的局部和全局特征之间的相关性，导致手模糊、自遮挡和自相似问题。我们提出了LGNet，局部和全局特征自适应网络，通过将手部网格重建过程分为三个阶段：关节阶段，用于预测手部关节；预测粗糙手网格的网格阶段；和细化阶段微调网格图像对齐使用偏移网格。LGNet能够实现高质量的指尖级网格图像对齐，有效地模拟双手之间的空间关系，并支持实时预测。对基准数据集的综合定量和定性评估表明，LGNet在网格精度和对齐精度方面优于现有方法，同时在野外图像测试中也表现出强大的泛化性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Animation and Virtual Worlds 工程技术-计算机：软件工程

CiteScore

2.20

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.