VMarker-Pro: Probabilistic 3D Human Mesh Estimation From Virtual Markers

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-28 DOI:10.1109/TPAMI.2025.3535538

Xiaoxuan Ma;Jiajun Su;Yuan Xu;Wentao Zhu;Chunyu Wang;Yizhou Wang

{"title":"VMarker-Pro: Probabilistic 3D Human Mesh Estimation From Virtual Markers","authors":"Xiaoxuan Ma;Jiajun Su;Yuan Xu;Wentao Zhu;Chunyu Wang;Yizhou Wang","doi":"10.1109/TPAMI.2025.3535538","DOIUrl":null,"url":null,"abstract":"Monocular 3D human mesh estimation faces challenges due to depth ambiguity and the complexity of mapping images to complex parameter spaces. Recent methods propose to use 3D poses as a proxy representation, which often lose crucial body shape information, leading to mediocre performance. Conversely, advanced motion capture systems, though accurate, are impractical for markerless wild images. Addressing these limitations, we introduce an innovative intermediate representation as <italic>virtual markers</i>, which are learned from large-scale mocap data, mimicking the effects of physical markers. Building upon virtual markers, we propose VMarker, which detects virtual markers from wild images, and the intact mesh with realistic shapes can be obtained by simply interpolation from these markers. To address occlusions that obscure 3D virtual marker estimation, we further enhance our method with VMarker-Pro, a probabilistic framework that models the distribution of 3D virtual marker positions using diffusion models, enabling the generation of multiple plausible meshes aligned with images for robust 3D mesh estimation. Our approaches surpass existing methods on three benchmark datasets, particularly demonstrating significant improvements on the SURREAL dataset, which features diverse body shapes. Additionally, VMarker-Pro excels in accurately modeling data distributions, significantly enhancing performance in occluded scenarios.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3731-3747"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10856385/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Monocular 3D human mesh estimation faces challenges due to depth ambiguity and the complexity of mapping images to complex parameter spaces. Recent methods propose to use 3D poses as a proxy representation, which often lose crucial body shape information, leading to mediocre performance. Conversely, advanced motion capture systems, though accurate, are impractical for markerless wild images. Addressing these limitations, we introduce an innovative intermediate representation as virtual markers, which are learned from large-scale mocap data, mimicking the effects of physical markers. Building upon virtual markers, we propose VMarker, which detects virtual markers from wild images, and the intact mesh with realistic shapes can be obtained by simply interpolation from these markers. To address occlusions that obscure 3D virtual marker estimation, we further enhance our method with VMarker-Pro, a probabilistic framework that models the distribution of 3D virtual marker positions using diffusion models, enabling the generation of multiple plausible meshes aligned with images for robust 3D mesh estimation. Our approaches surpass existing methods on three benchmark datasets, particularly demonstrating significant improvements on the SURREAL dataset, which features diverse body shapes. Additionally, VMarker-Pro excels in accurately modeling data distributions, significantly enhancing performance in occluded scenarios.

查看原文本刊更多论文

VMarker-Pro：概率三维人体网格估计从虚拟标记

单目三维人体网格估计由于深度模糊和将图像映射到复杂参数空间的复杂性而面临挑战。最近的方法建议使用3D姿势作为代理表示，这通常会丢失关键的身体形状信息，导致性能平庸。相反，先进的动作捕捉系统虽然准确，但对于无标记的野生图像来说是不切实际的。为了解决这些限制，我们引入了一种创新的中间表示作为虚拟标记，它从大规模动作捕捉数据中学习，模仿物理标记的效果。在虚拟标记的基础上，我们提出了VMarker，它可以从野生图像中检测虚拟标记，并且可以通过简单的插值从这些标记中获得具有逼真形状的完整网格。为了解决遮挡3D虚拟标记估计的问题，我们使用VMarker-Pro进一步增强了我们的方法，VMarker-Pro是一个概率框架，使用扩散模型对3D虚拟标记位置的分布进行建模，从而能够生成与图像对齐的多个合理网格，以进行稳健的3D网格估计。我们的方法在三个基准数据集上超越了现有的方法，特别是在具有不同体型的超现实数据集上表现出显著的改进。此外，VMarker-Pro在准确建模数据分布方面表现出色，显著提高了遮挡场景下的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量