3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image

Ishani Chakraborty, Hui Cheng, O. Javed
{"title":"3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image","authors":"Ishani Chakraborty, Hui Cheng, O. Javed","doi":"10.1109/CVPR.2013.437","DOIUrl":null,"url":null,"abstract":"We present a unified framework for detecting and classifying people interactions in unconstrained user generated images. Unlike previous approaches that directly map people/face locations in 2D image space into features for classification, we first estimate camera viewpoint and people positions in 3D space and then extract spatial configuration features from explicit 3D people positions. This approach has several advantages. First, it can accurately estimate relative distances and orientations between people in 3D. Second, it encodes spatial arrangements of people into a richer set of shape descriptors than afforded in 2D. Our 3D shape descriptors are invariant to camera pose variations often seen in web images and videos. The proposed approach also estimates camera pose and uses it to capture the intent of the photo. To achieve accurate 3D people layout estimation, we develop an algorithm that robustly fuses semantic constraints about human interpositions into a linear camera model. This enables our model to handle large variations in people size, heights (e.g. age) and poses. An accurate 3D layout also allows us to construct features informed by Proxemics that improves our semantic classification. To characterize the human interaction space, we introduce visual proxemes, a set of prototypical patterns that represent commonly occurring social interactions in events. We train a discriminative classifier that classifies 3D arrangements of people into visual proxemes and quantitatively evaluate the performance on a large, challenging dataset.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"30 1","pages":"3406-3413"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2013.437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

We present a unified framework for detecting and classifying people interactions in unconstrained user generated images. Unlike previous approaches that directly map people/face locations in 2D image space into features for classification, we first estimate camera viewpoint and people positions in 3D space and then extract spatial configuration features from explicit 3D people positions. This approach has several advantages. First, it can accurately estimate relative distances and orientations between people in 3D. Second, it encodes spatial arrangements of people into a richer set of shape descriptors than afforded in 2D. Our 3D shape descriptors are invariant to camera pose variations often seen in web images and videos. The proposed approach also estimates camera pose and uses it to capture the intent of the photo. To achieve accurate 3D people layout estimation, we develop an algorithm that robustly fuses semantic constraints about human interpositions into a linear camera model. This enables our model to handle large variations in people size, heights (e.g. age) and poses. An accurate 3D layout also allows us to construct features informed by Proxemics that improves our semantic classification. To characterize the human interaction space, we introduce visual proxemes, a set of prototypical patterns that represent commonly occurring social interactions in events. We train a discriminative classifier that classifies 3D arrangements of people into visual proxemes and quantitatively evaluate the performance on a large, challenging dataset.
3D视觉接近学:从单个图像中识别3D中的人类互动
我们提出了一个统一的框架,用于检测和分类无约束用户生成图像中的人员交互。与以往直接将2D图像空间中的人/脸位置映射为特征进行分类的方法不同,我们首先估计3D空间中的相机视点和人的位置,然后从明确的3D人物位置中提取空间配置特征。这种方法有几个优点。首先,它可以准确地估计3D中人与人之间的相对距离和方向。其次,它将人的空间排列编码成一组比2D更丰富的形状描述符。我们的3D形状描述符对于在网络图像和视频中经常看到的相机姿势变化是不变的。该方法还可以估计相机的姿势,并用它来捕捉照片的意图。为了实现准确的三维人物布局估计,我们开发了一种算法,该算法将关于人物插入的语义约束稳健地融合到线性相机模型中。这使我们的模型能够处理人的尺寸、身高(例如年龄)和姿势的巨大变化。精确的3D布局还允许我们构建由Proxemics通知的特征,从而改进我们的语义分类。为了描述人类互动空间的特征,我们引入了视觉特征,这是一组代表事件中常见的社会互动的原型模式。我们训练了一个判别分类器,该分类器将人的3D排列分类为视觉对象,并在一个大型的、具有挑战性的数据集上定量评估其性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信