Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision Pub Date : 2022-10-24 DOI:10.48550/arXiv.2210.13529

Junuk Cha, Muhammad Saqlain, Geonu Kim, Minjung Shin, Seungryul Baek

{"title":"Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement","authors":"Junuk Cha, Muhammad Saqlain, Geonu Kim, Minjung Shin, Seungryul Baek","doi":"10.48550/arXiv.2210.13529","DOIUrl":null,"url":null,"abstract":"Estimating 3D poses and shapes in the form of meshes from monocular RGB images is challenging. Obviously, it is more difficult than estimating 3D poses only in the form of skeletons or heatmaps. When interacting persons are involved, the 3D mesh reconstruction becomes more challenging due to the ambiguity introduced by person-to-person occlusions. To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) Transformer-based relation-aware refinement techniques. In our pipeline, we first obtain occlusion-robust 3D skeletons for multiple persons from an RGB image. Then, we apply inverse kinematics to convert the estimated skeletons to deformable 3D mesh parameters. Finally, we apply the Transformer-based mesh refinement that refines the obtained mesh parameters considering intra- and inter-person relations of 3D meshes. Via extensive experiments, we demonstrate the effectiveness of our method, outperforming state-of-the-arts on 3DPW, MuPoTS and AGORA datasets.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"32 1","pages":"660-677"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.13529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Estimating 3D poses and shapes in the form of meshes from monocular RGB images is challenging. Obviously, it is more difficult than estimating 3D poses only in the form of skeletons or heatmaps. When interacting persons are involved, the 3D mesh reconstruction becomes more challenging due to the ambiguity introduced by person-to-person occlusions. To tackle the challenges, we propose a coarse-to-fine pipeline that benefits from 1) inverse kinematics from the occlusion-robust 3D skeleton estimation and 2) Transformer-based relation-aware refinement techniques. In our pipeline, we first obtain occlusion-robust 3D skeletons for multiple persons from an RGB image. Then, we apply inverse kinematics to convert the estimated skeletons to deformable 3D mesh parameters. Finally, we apply the Transformer-based mesh refinement that refines the obtained mesh parameters considering intra- and inter-person relations of 3D meshes. Via extensive experiments, we demonstrate the effectiveness of our method, outperforming state-of-the-arts on 3DPW, MuPoTS and AGORA datasets.

查看原文本刊更多论文

基于逆运动学和改进的多人三维姿态和形状估计

从单目RGB图像中以网格形式估计3D姿势和形状是具有挑战性的。显然，这比仅以骨架或热图的形式估计3D姿势要困难得多。当涉及到相互作用的人时，由于人对人遮挡带来的模糊性，三维网格重建变得更具挑战性。为了解决这些挑战，我们提出了一种从粗到细的管道，它受益于1)基于遮挡鲁棒3D骨架估计的逆运动学和2)基于变压器的关系感知细化技术。在我们的管道中，我们首先从RGB图像中获得多人的遮挡鲁棒3D骨架。然后，我们应用逆运动学将估计的骨架转换为可变形的三维网格参数。最后，我们应用基于transformer的网格细化，考虑到三维网格的内部和内部关系，对得到的网格参数进行细化。通过大量的实验，我们证明了我们的方法的有效性，在3DPW, MuPoTS和AGORA数据集上优于最先进的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

自引率

0.00%

发文量