利用单目视频增强单帧3D人体姿态估计

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI:10.1109/ICCV.2019.00228

Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang

{"title":"利用单目视频增强单帧3D人体姿态估计","authors":"Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang","doi":"10.1109/ICCV.2019.00228","DOIUrl":null,"url":null,"abstract":"The premise of training an accurate 3D human pose estimation network is the possession of huge amount of richly annotated training data. Nonetheless, manually obtaining rich and accurate annotations is, even not impossible, tedious and slow. In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks. At the beginning, a baseline model is trained with a small set of annotations. By fixing some reliable estimations produced by the resulting model, our method automatically collects the annotations across the entire video as solving the 3D trajectory completion problem. Then, the baseline model is further trained with the collected annotations to learn the new poses. We evaluate our method on the broadly-adopted Human3.6M and MPI-INF-3DHP datasets. As illustrated in experiments, given only a small set of annotations, our method successfully makes the model to learn new poses from unlabelled monocular videos, promoting the accuracies of the baseline model by about 10%. By contrast with previous approaches, our method does not rely on either multi-view imagery or any explicit 2D keypoint annotations.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"45 1","pages":"2192-2201"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos\",\"authors\":\"Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang\",\"doi\":\"10.1109/ICCV.2019.00228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The premise of training an accurate 3D human pose estimation network is the possession of huge amount of richly annotated training data. Nonetheless, manually obtaining rich and accurate annotations is, even not impossible, tedious and slow. In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks. At the beginning, a baseline model is trained with a small set of annotations. By fixing some reliable estimations produced by the resulting model, our method automatically collects the annotations across the entire video as solving the 3D trajectory completion problem. Then, the baseline model is further trained with the collected annotations to learn the new poses. We evaluate our method on the broadly-adopted Human3.6M and MPI-INF-3DHP datasets. As illustrated in experiments, given only a small set of annotations, our method successfully makes the model to learn new poses from unlabelled monocular videos, promoting the accuracies of the baseline model by about 10%. By contrast with previous approaches, our method does not rely on either multi-view imagery or any explicit 2D keypoint annotations.\",\"PeriodicalId\":6728,\"journal\":{\"name\":\"2019 IEEE/CVF International Conference on Computer Vision (ICCV)\",\"volume\":\"45 1\",\"pages\":\"2192-2201\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE/CVF International Conference on Computer Vision (ICCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2019.00228\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2019.00228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

摘要

训练出一个准确的三维人体姿态估计网络的前提是拥有大量注释丰富的训练数据。尽管如此，手动获取丰富而准确的注释，即使不是不可能的，也是冗长而缓慢的。在本文中，我们提出利用单目视频来补充训练数据集，用于单图像3D人体姿态估计任务。一开始，基线模型是用一小组注释进行训练的。通过修复一些由结果模型产生的可靠估计，我们的方法自动收集整个视频中的注释，以解决3D轨迹完成问题。然后，使用收集到的注释对基线模型进行进一步训练，以学习新的姿态。我们在广泛采用的Human3.6M和MPI-INF-3DHP数据集上评估了我们的方法。实验表明，仅在少量注释的情况下，我们的方法就成功地使模型从未标记的单目视频中学习新的姿势，将基线模型的准确率提高了10%左右。与之前的方法相比，我们的方法既不依赖于多视图图像，也不依赖于任何显式的2D关键点注释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos

The premise of training an accurate 3D human pose estimation network is the possession of huge amount of richly annotated training data. Nonetheless, manually obtaining rich and accurate annotations is, even not impossible, tedious and slow. In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks. At the beginning, a baseline model is trained with a small set of annotations. By fixing some reliable estimations produced by the resulting model, our method automatically collects the annotations across the entire video as solving the 3D trajectory completion problem. Then, the baseline model is further trained with the collected annotations to learn the new poses. We evaluate our method on the broadly-adopted Human3.6M and MPI-INF-3DHP datasets. As illustrated in experiments, given only a small set of annotations, our method successfully makes the model to learn new poses from unlabelled monocular videos, promoting the accuracies of the baseline model by about 10%. By contrast with previous approaches, our method does not rely on either multi-view imagery or any explicit 2D keypoint annotations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量