Multi-View Hand Reconstruction With a Point-Embedded Transformer

IF 18.6
Lixin Yang;Licheng Zhong;Pengxiang Zhu;Xinyu Zhan;Junxiao Kong;Jian Xu;Cewu Lu
{"title":"Multi-View Hand Reconstruction With a Point-Embedded Transformer","authors":"Lixin Yang;Licheng Zhong;Pengxiang Zhu;Xinyu Zhan;Junxiao Kong;Jian Xu;Cewu Lu","doi":"10.1109/TPAMI.2025.3598089","DOIUrl":null,"url":null,"abstract":"This work introduces a novel and generalizable multi-view Hand Mesh Reconstruction (HMR) model, named POEM, designed for practical use in real-world hand motion capture scenarios. The advances of the POEM model consist of two main aspects. First, concerning the modeling of the problem, we propose embedding a static basis point within the multi-view stereo space. A point represents a natural form of 3D information and serves as an ideal medium for fusing features across different views, given its varied projections across these views. Consequently, our method harnesses a simple yet effective idea: a complex 3D hand mesh can be represented by a set of 3D basis points that 1) are embedded in the multi-view stereo, 2) carry features from the multi-view images, and 3) encompass the hand in it. The second advance lies in the training strategy. We utilize a combination of five large-scale multi-view datasets and employ randomization in the number, order, and poses of the cameras. By processing such a vast amount of data and a diverse array of camera configurations, our model demonstrates notable generalizability in the real-world applications. As a result, POEM presents a highly practical, plug-and-play solution that enables user-friendly, cost-effective multi-view motion capture for both left and right hands.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 11","pages":"10680-10695"},"PeriodicalIF":18.6000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11123707/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This work introduces a novel and generalizable multi-view Hand Mesh Reconstruction (HMR) model, named POEM, designed for practical use in real-world hand motion capture scenarios. The advances of the POEM model consist of two main aspects. First, concerning the modeling of the problem, we propose embedding a static basis point within the multi-view stereo space. A point represents a natural form of 3D information and serves as an ideal medium for fusing features across different views, given its varied projections across these views. Consequently, our method harnesses a simple yet effective idea: a complex 3D hand mesh can be represented by a set of 3D basis points that 1) are embedded in the multi-view stereo, 2) carry features from the multi-view images, and 3) encompass the hand in it. The second advance lies in the training strategy. We utilize a combination of five large-scale multi-view datasets and employ randomization in the number, order, and poses of the cameras. By processing such a vast amount of data and a diverse array of camera configurations, our model demonstrates notable generalizability in the real-world applications. As a result, POEM presents a highly practical, plug-and-play solution that enables user-friendly, cost-effective multi-view motion capture for both left and right hands.
基于点嵌入变压器的多视图手重构。
这项工作介绍了一种新颖的、可推广的多视图手部网格重建(HMR)模型,名为POEM,旨在实际应用于现实世界的手部动作捕捉场景。POEM模型的进步主要体现在两个方面。首先,在问题的建模方面,我们提出在多视点立体空间中嵌入一个静态基点。一个点代表了一种自然形式的3D信息,并作为融合不同视图特征的理想媒介,因为它在这些视图中的投影不同。因此,我们的方法利用了一个简单而有效的思想:一个复杂的3D手网格可以由一组3D基点表示,这些基点1)嵌入在多视图立体图像中,2)携带多视图图像的特征,3)包含其中的手。第二个进步在于训练策略。我们利用五个大规模多视图数据集的组合,并在相机的数量、顺序和姿势上采用随机化。通过处理如此大量的数据和各种相机配置,我们的模型在现实世界的应用中显示出显著的通用性。因此,POEM提供了一种高度实用的即插即用解决方案,可为左手和右手提供用户友好,经济高效的多视图动作捕捉。模型和源代码可在https://github.com/JubSteven/POEM-v2上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信