Three-Dimensional Reconstruction of Human Interactions

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2020-06-01 DOI:10.1109/cvpr42600.2020.00724

Mihai Fieraru, M. Zanfir, Elisabeta Oneata, A. Popa, Vlad Olaru, C. Sminchisescu

{"title":"Three-Dimensional Reconstruction of Human Interactions","authors":"Mihai Fieraru, M. Zanfir, Elisabeta Oneata, A. Popa, Vlad Olaru, C. Sminchisescu","doi":"10.1109/cvpr42600.2020.00724","DOIUrl":null,"url":null,"abstract":"Understanding 3d human interactions is fundamental for fine grained scene analysis and behavioural modeling. However, most of the existing models focus on analyzing a single person in isolation, and those who process several people focus largely on resolving multi-person data association, rather than inferring interactions. This may lead to incorrect, lifeless 3d estimates, that miss the subtle human contact aspects--the essence of the event--and are of little use for detailed behavioral understanding. This paper addresses such issues and makes several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such components can be leveraged in order to produce augmented losses that ensure contact consistency during 3d reconstruction; (3) we construct several large datasets for learning and evaluating 3d contact prediction and reconstruction methods; specifically, we introduce CHI3D, a lab-based accurate 3d motion capture dataset with 631 sequences containing 2,525 contact events, 728,664 ground truth 3d poses, as well as FlickrCI3D, a dataset of 11,216 images, with 14,081 processed pairs of people, and 81,233 facet-level surface correspondences within 138,213 selected contact regions. Finally, (4) we present models and baselines to illustrate how contact estimation supports meaningful 3d reconstruction where essential interactions are captured. Models and data are made available for research purposes at http://vision.imar.ro/ci3d.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"50 1","pages":"7212-7221"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cvpr42600.2020.00724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 57

Abstract

Understanding 3d human interactions is fundamental for fine grained scene analysis and behavioural modeling. However, most of the existing models focus on analyzing a single person in isolation, and those who process several people focus largely on resolving multi-person data association, rather than inferring interactions. This may lead to incorrect, lifeless 3d estimates, that miss the subtle human contact aspects--the essence of the event--and are of little use for detailed behavioral understanding. This paper addresses such issues and makes several contributions: (1) we introduce models for interaction signature estimation (ISP) encompassing contact detection, segmentation, and 3d contact signature prediction; (2) we show how such components can be leveraged in order to produce augmented losses that ensure contact consistency during 3d reconstruction; (3) we construct several large datasets for learning and evaluating 3d contact prediction and reconstruction methods; specifically, we introduce CHI3D, a lab-based accurate 3d motion capture dataset with 631 sequences containing 2,525 contact events, 728,664 ground truth 3d poses, as well as FlickrCI3D, a dataset of 11,216 images, with 14,081 processed pairs of people, and 81,233 facet-level surface correspondences within 138,213 selected contact regions. Finally, (4) we present models and baselines to illustrate how contact estimation supports meaningful 3d reconstruction where essential interactions are captured. Models and data are made available for research purposes at http://vision.imar.ro/ci3d.

查看原文本刊更多论文

人类互动的三维重建

理解3d人类互动是细粒度场景分析和行为建模的基础。然而，大多数现有模型侧重于孤立地分析单个人，而那些处理几个人的模型主要侧重于解决多人数据关联，而不是推断交互。这可能会导致错误的、毫无生气的3d估计，忽略了微妙的人类接触方面(事件的本质)，并且对详细的行为理解几乎没有用处。本文解决了这些问题，并做出了一些贡献:(1)我们引入了交互签名估计(ISP)模型，包括接触检测、分割和三维接触签名预测;(2)我们展示了如何利用这些组件来产生增强的损失，以确保3d重建期间的接触一致性;(3)构建了多个大型数据集，用于学习和评估三维接触预测和重建方法;具体来说，我们介绍了CHI3D，一个基于实验室的精确3d动作捕捉数据集，其中包含631个序列，包含2,525个接触事件，728,664个地面真实3d姿势，以及FlickrCI3D，一个包含11,216张图像的数据集，其中包含14,081对处理过的人，以及在138,213个选定的接触区域内的81,233个面级表面对应。最后，(4)我们提出了模型和基线来说明接触估计如何支持有意义的三维重建，其中捕获了基本的相互作用。用于研究目的的模型和数据可在http://vision.imar.ro/ci3d上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量