VMarker-Pro: Probabilistic 3D Human Mesh Estimation From Virtual Markers

IF 18.6
Xiaoxuan Ma;Jiajun Su;Yuan Xu;Wentao Zhu;Chunyu Wang;Yizhou Wang
{"title":"VMarker-Pro: Probabilistic 3D Human Mesh Estimation From Virtual Markers","authors":"Xiaoxuan Ma;Jiajun Su;Yuan Xu;Wentao Zhu;Chunyu Wang;Yizhou Wang","doi":"10.1109/TPAMI.2025.3535538","DOIUrl":null,"url":null,"abstract":"Monocular 3D human mesh estimation faces challenges due to depth ambiguity and the complexity of mapping images to complex parameter spaces. Recent methods propose to use 3D poses as a proxy representation, which often lose crucial body shape information, leading to mediocre performance. Conversely, advanced motion capture systems, though accurate, are impractical for markerless wild images. Addressing these limitations, we introduce an innovative intermediate representation as <italic>virtual markers</i>, which are learned from large-scale mocap data, mimicking the effects of physical markers. Building upon virtual markers, we propose VMarker, which detects virtual markers from wild images, and the intact mesh with realistic shapes can be obtained by simply interpolation from these markers. To address occlusions that obscure 3D virtual marker estimation, we further enhance our method with VMarker-Pro, a probabilistic framework that models the distribution of 3D virtual marker positions using diffusion models, enabling the generation of multiple plausible meshes aligned with images for robust 3D mesh estimation. Our approaches surpass existing methods on three benchmark datasets, particularly demonstrating significant improvements on the SURREAL dataset, which features diverse body shapes. Additionally, VMarker-Pro excels in accurately modeling data distributions, significantly enhancing performance in occluded scenarios.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3731-3747"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10856385/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Monocular 3D human mesh estimation faces challenges due to depth ambiguity and the complexity of mapping images to complex parameter spaces. Recent methods propose to use 3D poses as a proxy representation, which often lose crucial body shape information, leading to mediocre performance. Conversely, advanced motion capture systems, though accurate, are impractical for markerless wild images. Addressing these limitations, we introduce an innovative intermediate representation as virtual markers, which are learned from large-scale mocap data, mimicking the effects of physical markers. Building upon virtual markers, we propose VMarker, which detects virtual markers from wild images, and the intact mesh with realistic shapes can be obtained by simply interpolation from these markers. To address occlusions that obscure 3D virtual marker estimation, we further enhance our method with VMarker-Pro, a probabilistic framework that models the distribution of 3D virtual marker positions using diffusion models, enabling the generation of multiple plausible meshes aligned with images for robust 3D mesh estimation. Our approaches surpass existing methods on three benchmark datasets, particularly demonstrating significant improvements on the SURREAL dataset, which features diverse body shapes. Additionally, VMarker-Pro excels in accurately modeling data distributions, significantly enhancing performance in occluded scenarios.
VMarker-Pro:概率三维人体网格估计从虚拟标记
单目三维人体网格估计由于深度模糊和将图像映射到复杂参数空间的复杂性而面临挑战。最近的方法建议使用3D姿势作为代理表示,这通常会丢失关键的身体形状信息,导致性能平庸。相反,先进的动作捕捉系统虽然准确,但对于无标记的野生图像来说是不切实际的。为了解决这些限制,我们引入了一种创新的中间表示作为虚拟标记,它从大规模动作捕捉数据中学习,模仿物理标记的效果。在虚拟标记的基础上,我们提出了VMarker,它可以从野生图像中检测虚拟标记,并且可以通过简单的插值从这些标记中获得具有逼真形状的完整网格。为了解决遮挡3D虚拟标记估计的问题,我们使用VMarker-Pro进一步增强了我们的方法,VMarker-Pro是一个概率框架,使用扩散模型对3D虚拟标记位置的分布进行建模,从而能够生成与图像对齐的多个合理网格,以进行稳健的3D网格估计。我们的方法在三个基准数据集上超越了现有的方法,特别是在具有不同体型的超现实数据集上表现出显著的改进。此外,VMarker-Pro在准确建模数据分布方面表现出色,显著提高了遮挡场景下的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信