{"title":"3DPSR: An innovative approach for pose and shape refinement in 3D human meshes from a single 2D image","authors":"Mohit Kushwaha, Jaytrilok Choudhary , Dhirendra Pratap Singh","doi":"10.1016/j.imavis.2024.105311","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of computer vision, 3D human models are gaining a lot of interest in the gaming industry, cloth parsing, avatar creations, and many more applications. In these fields, having a precise 3D human model with accurate shape and pose is crucial for realistic and high-quality results. We proposed an approach called 3DPSR that uses a single 2D image and reconstructs precise 3D human meshes with better alignment of pose and shape. 3DPSR is referred to as <strong>3D P</strong>ose and <strong>S</strong>hape <strong>R</strong>efinements. 3DPSR contains two modules (mesh deformation using pose-fitting and shape-fitting), in which mesh deformation using shape-fitting acts as a refinement module. Compared to existing methods, the proposed method, 3DPSR, delivers more enhanced MPVE and PA-MPJPE results, as well as more accurate 3D models of humans. 3DPSR significantly outperforms state-of-the-art human mesh reconstruction methods on challenging and standard datasets such as SURREAL, Human3.6M, and 3DPW across different scenarios with complex poses, establishing a new benchmark.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105311"},"PeriodicalIF":4.2000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004165","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the era of computer vision, 3D human models are gaining a lot of interest in the gaming industry, cloth parsing, avatar creations, and many more applications. In these fields, having a precise 3D human model with accurate shape and pose is crucial for realistic and high-quality results. We proposed an approach called 3DPSR that uses a single 2D image and reconstructs precise 3D human meshes with better alignment of pose and shape. 3DPSR is referred to as 3D Pose and Shape Refinements. 3DPSR contains two modules (mesh deformation using pose-fitting and shape-fitting), in which mesh deformation using shape-fitting acts as a refinement module. Compared to existing methods, the proposed method, 3DPSR, delivers more enhanced MPVE and PA-MPJPE results, as well as more accurate 3D models of humans. 3DPSR significantly outperforms state-of-the-art human mesh reconstruction methods on challenging and standard datasets such as SURREAL, Human3.6M, and 3DPW across different scenarios with complex poses, establishing a new benchmark.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.