一种新的基于变压器的多视图三维人体网格重建框架

2022 2nd International Conference on Networking Systems of AI (INSAI) Pub Date : 2022-10-01 DOI:10.1109/INSAI56792.2022.00042

Entao Chen, Bobo Ju, Linhua Jiang, Dongfang Zhao

{"title":"一种新的基于变压器的多视图三维人体网格重建框架","authors":"Entao Chen, Bobo Ju, Linhua Jiang, Dongfang Zhao","doi":"10.1109/INSAI56792.2022.00042","DOIUrl":null,"url":null,"abstract":"This paper addresses two key problems of multi-view 3D Human Mesh Reconstruction (HMR): the difficulty of fusing features from multiple images and the lack of training data. We design a novel Transformer-based framework called Multi-View Human Mesh Transformer (MV-HMT), which is comprised of parallel Tiny CNNs and Transformer Encoder. MV-HMT takes multi-view silhouette as inputs, regresses the parameters of human shape and pose, and is effective for multi-view feature fusion. Real-Time Data Synthetic (RT-DS) technique is proposed in this work to solve the second problem. RT -DS is a plug-and-play component that generates paired silhouettes-mesh on CUDA, and provides an inexhaustible supply of synthesis data for pre-training of the neural network. Our method outperforms existing methods for multi-view HMR on the four-view datasets MPI-INF-3DHP and Human3.6M. Another new three-view dataset, MoVi, with more subjects and more accurate annotation, was used to evaluate the generality of our method and showed remarkable results.","PeriodicalId":318264,"journal":{"name":"2022 2nd International Conference on Networking Systems of AI (INSAI)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Transformer-based Framework for Multi-View 3D Human Mesh Reconstruction\",\"authors\":\"Entao Chen, Bobo Ju, Linhua Jiang, Dongfang Zhao\",\"doi\":\"10.1109/INSAI56792.2022.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses two key problems of multi-view 3D Human Mesh Reconstruction (HMR): the difficulty of fusing features from multiple images and the lack of training data. We design a novel Transformer-based framework called Multi-View Human Mesh Transformer (MV-HMT), which is comprised of parallel Tiny CNNs and Transformer Encoder. MV-HMT takes multi-view silhouette as inputs, regresses the parameters of human shape and pose, and is effective for multi-view feature fusion. Real-Time Data Synthetic (RT-DS) technique is proposed in this work to solve the second problem. RT -DS is a plug-and-play component that generates paired silhouettes-mesh on CUDA, and provides an inexhaustible supply of synthesis data for pre-training of the neural network. Our method outperforms existing methods for multi-view HMR on the four-view datasets MPI-INF-3DHP and Human3.6M. Another new three-view dataset, MoVi, with more subjects and more accurate annotation, was used to evaluate the generality of our method and showed remarkable results.\",\"PeriodicalId\":318264,\"journal\":{\"name\":\"2022 2nd International Conference on Networking Systems of AI (INSAI)\",\"volume\":\"126 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on Networking Systems of AI (INSAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INSAI56792.2022.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Networking Systems of AI (INSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INSAI56792.2022.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文解决了多视图三维人体网格重建(HMR)的两个关键问题:多图像特征融合困难和缺乏训练数据。我们设计了一种新的基于变压器的框架，称为Multi-View Human Mesh Transformer (MV-HMT)，它由并行微型cnn和变压器编码器组成。MV-HMT以多视角轮廓作为输入，对人体形状和姿态参数进行回归，能够有效地进行多视角特征融合。本文提出了实时数据合成(RT-DS)技术来解决第二个问题。RT -DS是一个即插即用的组件，可以在CUDA上生成配对轮廓网格，并为神经网络的预训练提供取之不竭的合成数据。我们的方法在MPI-INF-3DHP和Human3.6M四视图数据集上优于现有的多视图HMR方法。使用另一个新的三视图数据集MoVi来评估我们的方法的通用性，该数据集具有更多的主题和更准确的注释，并显示了显着的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Novel Transformer-based Framework for Multi-View 3D Human Mesh Reconstruction

This paper addresses two key problems of multi-view 3D Human Mesh Reconstruction (HMR): the difficulty of fusing features from multiple images and the lack of training data. We design a novel Transformer-based framework called Multi-View Human Mesh Transformer (MV-HMT), which is comprised of parallel Tiny CNNs and Transformer Encoder. MV-HMT takes multi-view silhouette as inputs, regresses the parameters of human shape and pose, and is effective for multi-view feature fusion. Real-Time Data Synthetic (RT-DS) technique is proposed in this work to solve the second problem. RT -DS is a plug-and-play component that generates paired silhouettes-mesh on CUDA, and provides an inexhaustible supply of synthesis data for pre-training of the neural network. Our method outperforms existing methods for multi-view HMR on the four-view datasets MPI-INF-3DHP and Human3.6M. Another new three-view dataset, MoVi, with more subjects and more accurate annotation, was used to evaluate the generality of our method and showed remarkable results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 2nd International Conference on Networking Systems of AI (INSAI)

自引率

0.00%

发文量