{"title":"A Novel Transformer-based Framework for Multi-View 3D Human Mesh Reconstruction","authors":"Entao Chen, Bobo Ju, Linhua Jiang, Dongfang Zhao","doi":"10.1109/INSAI56792.2022.00042","DOIUrl":null,"url":null,"abstract":"This paper addresses two key problems of multi-view 3D Human Mesh Reconstruction (HMR): the difficulty of fusing features from multiple images and the lack of training data. We design a novel Transformer-based framework called Multi-View Human Mesh Transformer (MV-HMT), which is comprised of parallel Tiny CNNs and Transformer Encoder. MV-HMT takes multi-view silhouette as inputs, regresses the parameters of human shape and pose, and is effective for multi-view feature fusion. Real-Time Data Synthetic (RT-DS) technique is proposed in this work to solve the second problem. RT -DS is a plug-and-play component that generates paired silhouettes-mesh on CUDA, and provides an inexhaustible supply of synthesis data for pre-training of the neural network. Our method outperforms existing methods for multi-view HMR on the four-view datasets MPI-INF-3DHP and Human3.6M. Another new three-view dataset, MoVi, with more subjects and more accurate annotation, was used to evaluate the generality of our method and showed remarkable results.","PeriodicalId":318264,"journal":{"name":"2022 2nd International Conference on Networking Systems of AI (INSAI)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Networking Systems of AI (INSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INSAI56792.2022.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper addresses two key problems of multi-view 3D Human Mesh Reconstruction (HMR): the difficulty of fusing features from multiple images and the lack of training data. We design a novel Transformer-based framework called Multi-View Human Mesh Transformer (MV-HMT), which is comprised of parallel Tiny CNNs and Transformer Encoder. MV-HMT takes multi-view silhouette as inputs, regresses the parameters of human shape and pose, and is effective for multi-view feature fusion. Real-Time Data Synthetic (RT-DS) technique is proposed in this work to solve the second problem. RT -DS is a plug-and-play component that generates paired silhouettes-mesh on CUDA, and provides an inexhaustible supply of synthesis data for pre-training of the neural network. Our method outperforms existing methods for multi-view HMR on the four-view datasets MPI-INF-3DHP and Human3.6M. Another new three-view dataset, MoVi, with more subjects and more accurate annotation, was used to evaluate the generality of our method and showed remarkable results.