Multimodal analysis of free-standing conversational groups

Xavier Alameda-Pineda, E. Ricci, N. Sebe
{"title":"Multimodal analysis of free-standing conversational groups","authors":"Xavier Alameda-Pineda, E. Ricci, N. Sebe","doi":"10.1145/3122865.3122869","DOIUrl":null,"url":null,"abstract":"\"Free-standing conversational groups\" are what we call the elementary building blocks of social interactions formed in settings when people are standing and congregate in groups. The automatic detection, analysis, and tracking of such structural conversational units captured on camera poses many interesting challenges for the research community. First, although delineating these formations is strongly linked to other behavioral cues such as head and body poses, finding methods that successfully describe and exploit these links is not obvious. Second, the use of visual data is crucial, but when analyzing crowded scenes, one must account for occlusions and low-resolution images. In this regard, the use of other sensing technologies such as wearable devices can facilitate the analysis of social interactions by complementing the visual information. Yet the exploitation of multiple modalities poses other challenges in terms of data synchronization, calibration, and fusion. In this chapter, we discuss recent advances in multimodal social scene analysis, in particular for the detection of conversational groups or F-formations [Kendon 1990]. More precisely, a multimodal joint head and body pose estimator is described and compared to other recent approaches for head and body pose estimation and F-formation detection. Experimental results on the recently published SALSA dataset are reported, they evidence the long road toward a fully automated high-precision social scene analysis framework.","PeriodicalId":408764,"journal":{"name":"Frontiers of Multimedia Research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Multimedia Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3122865.3122869","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

"Free-standing conversational groups" are what we call the elementary building blocks of social interactions formed in settings when people are standing and congregate in groups. The automatic detection, analysis, and tracking of such structural conversational units captured on camera poses many interesting challenges for the research community. First, although delineating these formations is strongly linked to other behavioral cues such as head and body poses, finding methods that successfully describe and exploit these links is not obvious. Second, the use of visual data is crucial, but when analyzing crowded scenes, one must account for occlusions and low-resolution images. In this regard, the use of other sensing technologies such as wearable devices can facilitate the analysis of social interactions by complementing the visual information. Yet the exploitation of multiple modalities poses other challenges in terms of data synchronization, calibration, and fusion. In this chapter, we discuss recent advances in multimodal social scene analysis, in particular for the detection of conversational groups or F-formations [Kendon 1990]. More precisely, a multimodal joint head and body pose estimator is described and compared to other recent approaches for head and body pose estimation and F-formation detection. Experimental results on the recently published SALSA dataset are reported, they evidence the long road toward a fully automated high-precision social scene analysis framework.
独立会话组的多模态分析
“独立对话群体”是我们所说的社会互动的基本组成部分,当人们站着聚集在一起时形成。自动检测、分析和跟踪摄像机捕捉到的这种结构会话单元,为研究界提出了许多有趣的挑战。首先,尽管描绘这些形态与其他行为线索(如头部和身体姿势)密切相关,但找到成功描述和利用这些联系的方法并不明显。其次,视觉数据的使用是至关重要的,但在分析拥挤的场景时,必须考虑到遮挡和低分辨率图像。在这方面,使用其他传感技术,如可穿戴设备,可以通过补充视觉信息来促进社会互动的分析。然而,多模态的利用在数据同步、校准和融合方面提出了其他挑战。在本章中,我们讨论了多模态社会场景分析的最新进展,特别是在会话组或f -formation的检测方面[Kendon 1990]。更准确地说,描述了一个多模态关节头部和身体姿势估计器,并与其他最近的头部和身体姿势估计和F-formation检测方法进行了比较。最近公布的SALSA数据集的实验结果表明,实现全自动高精度社会场景分析框架还有很长的路要走。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信