Modelling and analyzing multimodal dyadic interactions using social networks

ICMI-MLMI '10 Pub Date : 2010-11-08 DOI:10.1145/1891903.1891967

Sergio Escalera, P. Radeva, Jordi Vitrià, Xavier Baró, B. Raducanu

{"title":"Modelling and analyzing multimodal dyadic interactions using social networks","authors":"Sergio Escalera, P. Radeva, Jordi Vitrià, Xavier Baró, B. Raducanu","doi":"10.1145/1891903.1891967","DOIUrl":null,"url":null,"abstract":"Social network analysis became a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. First, speech detection is performed through an audio/visual fusion scheme based on stacked sequential learning. In the audio domain, speech is detected through clusterization of audio features. Clusters are modelled by means of an One-state Hidden Markov Model containing a diagonal covariance Gaussian Mixture Model. In the visual domain, speech detection is performed through differential-based feature extraction from the segmented mouth region, and a dynamic programming matching procedure. Second, in order to model the dyadic interactions, we employed the Influence Model whose states encode the previous integrated audio/visual data. Third, the social network is extracted based on the estimated influences. For our study, we used a set of videos belonging to New York Times' Blogging Heads opinion blog. The results are reported both in terms of accuracy of the audio/visual data fusion and centrality measures used to characterize the social network.","PeriodicalId":181145,"journal":{"name":"ICMI-MLMI '10","volume":"53 1-2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICMI-MLMI '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1891903.1891967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Social network analysis became a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. First, speech detection is performed through an audio/visual fusion scheme based on stacked sequential learning. In the audio domain, speech is detected through clusterization of audio features. Clusters are modelled by means of an One-state Hidden Markov Model containing a diagonal covariance Gaussian Mixture Model. In the visual domain, speech detection is performed through differential-based feature extraction from the segmented mouth region, and a dynamic programming matching procedure. Second, in order to model the dyadic interactions, we employed the Influence Model whose states encode the previous integrated audio/visual data. Third, the social network is extracted based on the estimated influences. For our study, we used a set of videos belonging to New York Times' Blogging Heads opinion blog. The results are reported both in terms of accuracy of the audio/visual data fusion and centrality measures used to characterize the social network.

查看原文本刊更多论文

建模和分析使用社会网络的多模态二元互动

社会网络分析成为一种常用的技术，用于建模和量化社会互动的属性。在本文中，我们提出了一个综合框架来探索从多模态二元交互中提取的社会网络的特征。首先，通过基于堆叠顺序学习的视听融合方案进行语音检测。在音频领域，通过音频特征的聚类来检测语音。聚类模型采用含对角协方差高斯混合模型的单态隐马尔可夫模型进行建模。在视觉领域，语音检测是通过基于微分的特征提取，从分割的嘴区域，并动态规划匹配过程。其次，为了对二元交互建模，我们采用了影响模型，其状态编码了先前的集成音频/视觉数据。第三，根据估计的影响提取社会网络。在我们的研究中，我们使用了一组属于《纽约时报》博客观点博客的视频。结果报告了音频/视觉数据融合的准确性和用于表征社交网络的中心性度量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICMI-MLMI '10

自引率

0.00%

发文量