用伪标签学习时间三维人体姿态估计

2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) Pub Date : 2021-10-14 DOI:10.1109/AVSS52988.2021.9663755

Arij Bouazizi, U. Kressel, Vasileios Belagiannis

{"title":"用伪标签学习时间三维人体姿态估计","authors":"Arij Bouazizi, U. Kressel, Vasileios Belagiannis","doi":"10.1109/AVSS52988.2021.9663755","DOIUrl":null,"url":null,"abstract":"We present a simple, yet effective, approach for self-supervised 3D human pose estimation. Unlike the prior work, we explore the temporal information next to the multi-view self-supervision. During training, we rely on triangulating 2D body pose estimates of a multiple-view camera system. A temporal convolutional neural network is trained with the generated 3D ground-truth and the geometric multi-view consistency loss, imposing geometrical constraints on the predicted 3D body skeleton. During inference, our model receives a sequence of 2D body pose estimates from a single-view to predict the 3D body pose for each of them. An extensive evaluation shows that our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks. Our code and models are publicly available at https://github.com/vru2020/TM_HPE/.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Learning Temporal 3D Human Pose Estimation with Pseudo-Labels\",\"authors\":\"Arij Bouazizi, U. Kressel, Vasileios Belagiannis\",\"doi\":\"10.1109/AVSS52988.2021.9663755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a simple, yet effective, approach for self-supervised 3D human pose estimation. Unlike the prior work, we explore the temporal information next to the multi-view self-supervision. During training, we rely on triangulating 2D body pose estimates of a multiple-view camera system. A temporal convolutional neural network is trained with the generated 3D ground-truth and the geometric multi-view consistency loss, imposing geometrical constraints on the predicted 3D body skeleton. During inference, our model receives a sequence of 2D body pose estimates from a single-view to predict the 3D body pose for each of them. An extensive evaluation shows that our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks. Our code and models are publicly available at https://github.com/vru2020/TM_HPE/.\",\"PeriodicalId\":246327,\"journal\":{\"name\":\"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AVSS52988.2021.9663755\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS52988.2021.9663755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

我们提出了一种简单而有效的自监督3D人体姿态估计方法。与之前的工作不同，我们探索了多视图自我监督旁边的时间信息。在训练过程中，我们依赖于多视图相机系统的二维身体姿态估计三角测量。利用生成的三维地基真值和几何多视图一致性损失对时间卷积神经网络进行训练，对预测的三维人体骨架施加几何约束。在推理过程中，我们的模型从单视图中接收一系列2D身体姿势估计，以预测每个人的3D身体姿势。广泛的评估表明，我们的方法在Human3.6M和MPI-INF-3DHP基准测试中达到了最先进的性能。我们的代码和模型可以在https://github.com/vru2020/TM_HPE/上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Temporal 3D Human Pose Estimation with Pseudo-Labels

We present a simple, yet effective, approach for self-supervised 3D human pose estimation. Unlike the prior work, we explore the temporal information next to the multi-view self-supervision. During training, we rely on triangulating 2D body pose estimates of a multiple-view camera system. A temporal convolutional neural network is trained with the generated 3D ground-truth and the geometric multi-view consistency loss, imposing geometrical constraints on the predicted 3D body skeleton. During inference, our model receives a sequence of 2D body pose estimates from a single-view to predict the 3D body pose for each of them. An extensive evaluation shows that our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks. Our code and models are publicly available at https://github.com/vru2020/TM_HPE/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

自引率

0.00%

发文量