Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献_第5页

Accuracy vs. Availability Heuristic in Multimodal Affect Detection in the Wild 野外多模态情感检测的准确性与可用性启发式

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820739

Nigel Bosch, Huili Chen, S. D’Mello, R. Baker, V. Shute

{"title":"Accuracy vs. Availability Heuristic in Multimodal Affect Detection in the Wild","authors":"Nigel Bosch, Huili Chen, S. D’Mello, R. Baker, V. Shute","doi":"10.1145/2818346.2820739","DOIUrl":"https://doi.org/10.1145/2818346.2820739","url":null,"abstract":"This paper discusses multimodal affect detection from a fusion of facial expressions and interaction features derived from students' interactions with an educational game in the noisy real-world context of a computer-enabled classroom. Log data of students' interactions with the game and face videos from 133 students were recorded in a computer-enabled classroom over a two day period. Human observers live annotated learning-centered affective states such as engagement, confusion, and frustration. The face-only detectors were more accurate than interaction-only detectors. Multimodal affect detectors did not show any substantial improvement in accuracy over the face-only detectors. However, the face-only detectors were only applicable to 65% of the cases due to face registration errors caused by excessive movement, occlusion, poor lighting, and other factors. Multimodal fusion techniques were able to improve the applicability of detectors to 98% of cases without sacrificing classification accuracy. Balancing the accuracy vs. applicability tradeoff appears to be an important feature of multimodal affect detection.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88503240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Toward Better Understanding of Engagement in Multiparty Spoken Interaction with Children 更好地理解参与与儿童的多方口头互动

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820733

S. Moubayed, J. Lehman

{"title":"Toward Better Understanding of Engagement in Multiparty Spoken Interaction with Children","authors":"S. Moubayed, J. Lehman","doi":"10.1145/2818346.2820733","DOIUrl":"https://doi.org/10.1145/2818346.2820733","url":null,"abstract":"A system's ability to understand and model a human's engagement during an interactive task is important for both adapting its behavior to the moment and achieving a coherent interaction over time. Standard practice for creating such a capability requires uncovering and modeling the multimodal cues that predict engagement in a given task environment. The first step in this methodology is to have human coders produce \"gold standard\" judgments of sample behavior. In this paper we report results from applying this first step to the complex and varied behavior of children playing a fast-paced, speech-controlled, side-scrolling game called Mole Madness. We introduce a concrete metric for engagement-willingness to continue the interaction--that leads to better inter-coder judgments for children playing in pairs, explore how coders perceive the relative contribution of audio and visual cues, and describe engagement trends and patterns in our population. We also examine how the measures change when the same children play Mole Madness with a robot instead of a peer. We conclude by discussing the implications of the differences within and across play conditions for the automatic estimation of engagement and the extension of our autonomous robot player into a \"buddy\" that can individualize interaction for each player and game.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"143 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75350975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings 基于呼吸和凝视的多模态融合预测多方会议下一位发言者

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820755

Ryo Ishii, Shiro Kumano, K. Otsuka

{"title":"Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings","authors":"Ryo Ishii, Shiro Kumano, K. Otsuka","doi":"10.1145/2818346.2820755","DOIUrl":"https://doi.org/10.1145/2818346.2820755","url":null,"abstract":"Techniques that use nonverbal behaviors to predict turn-taking situations, such as who will be the next speaker and the next utterance timing in multi-party meetings are receiving a lot of attention recently. It has long been known that gaze is a physical behavior that plays an important role in transferring the speaking turn between humans. Recently, a line of research has focused on the relationship between turn-taking and respiration, a biological signal that conveys information about the intention or preliminary action to start to speak. It has been demonstrated that respiration and gaze behavior separately have the potential to allow predicting the next speaker and the next utterance timing in multi-party meetings. As a multimodal fusion to create models for predicting the next speaker in multi-party meetings, we integrated respiration and gaze behavior, which were extracted from different modalities and are completely different in quality, and implemented a model uses information about them to predict the next speaker at the end of an utterance. The model has a two-step processing. The first is to predict whether turn-keeping or turn-taking happens; the second is to predict the next speaker in turn-taking. We constructed prediction models with either respiration or gaze behavior and with both respiration and gaze behaviors as features and compared their performance. The results suggest that the model with both respiration and gaze behaviors performs better than the one using only respiration or gaze behavior. It is revealed that multimodal fusion using respiration and gaze behavior is effective for predicting the next speaker in multi-party meetings. It was found that gaze behavior is more useful for predicting turn-keeping/turn-taking than respiration and that respiration is more useful for predicting the next speaker in turn-taking.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91535789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

I Would Hire You in a Minute: Thin Slices of Nonverbal Behavior in Job Interviews 我会在一分钟内雇用你:工作面试中的非语言行为切片

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820760

L. Nguyen, D. Gática-Pérez

引用次数: 41

Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge EmotiW 2015大挑战中基于视频的面部情感识别的电影摄影符号学量化

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830592

Albert C. Cruz

{"title":"Quantification of Cinematography Semiotics for Video-based Facial Emotion Recognition in the EmotiW 2015 Grand Challenge","authors":"Albert C. Cruz","doi":"10.1145/2818346.2830592","DOIUrl":"https://doi.org/10.1145/2818346.2830592","url":null,"abstract":"The Emotion Recognition in the Wild challenge poses significant problems to state of the art auditory and visual affect quantification systems. To overcome the challenges, we investigate supplementary meta features based on film semiotics. Movie scenes are often presented and arranged in such a way as to amplify the emotion interpreted by the viewing audience. This technique is referred to as mise en scene in the film industry and involves strict and intentional control of color palette, light source color, and arrangement of actors and objects in the scene. To this end, two algorithms for extracting mise en scene information are proposed. Rule of thirds based motion history histograms detect motion along rule of thirds guidelines. Rule of thirds color layout descriptors compactly describe a scene at rule of thirds intersections. A comprehensive system is proposed that measures expression, emotion, vocalics, syntax, semantics, and film-based meta information. The proposed mise en scene features have a higher classification rate and ROC area than LBP-TOP features on the validation set of the EmotiW 2015 challenge. The complete system improves classification performance over the baseline algorithm by 3.17% on the testing set.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84877247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

NaLMC: A Database on Non-acted and Acted Emotional Sequences in HCI NaLMC: HCI中非行为和行为情绪序列的数据库

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820772

Kim Hartmann, J. Krüger, J. Frommer, A. Wendemuth

引用次数: 0

Instantaneous and Robust Eye-Activity Based Task Analysis 基于瞬时鲁棒眼活动的任务分析

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2823312

Hoe Kin Wong

引用次数: 1

Spectators' Synchronization Detection based on Manifold Representation of Physiological Signals: Application to Movie Highlights Detection 基于生理信号流形表示的观众同步检测:在电影亮点检测中的应用

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820773

Michal Muszynski, Theodoros Kostoulas, G. Chanel, Patrizia Lombardo, T. Pun

{"title":"Spectators' Synchronization Detection based on Manifold Representation of Physiological Signals: Application to Movie Highlights Detection","authors":"Michal Muszynski, Theodoros Kostoulas, G. Chanel, Patrizia Lombardo, T. Pun","doi":"10.1145/2818346.2820773","DOIUrl":"https://doi.org/10.1145/2818346.2820773","url":null,"abstract":"Detection of highlights in movies is a challenge for the affective understanding and implicit tagging of films. Under the hypothesis that synchronization of the reaction of spectators indicates such highlights, we define a synchronization measure between spectators that is capable of extracting movie highlights. The intuitive idea of our approach is to define (a) a parameterization of one spectator's physiological data on a manifold; (b) the synchronization measure between spectators as the Kolmogorov-Smirnov distance between local shape distributions of the underlying manifolds. We evaluate our approach using data collected in an experiment where the electro-dermal activity of spectators was recorded during the entire projection of a movie in a cinema. We compare our methodology with baseline synchronization measures, such as correlation, Spearman's rank correlation, mutual information, Kolmogorov-Smirnov distance. Results indicate that the proposed approach allows to accurately distinguish highlight from non-highlight scenes.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"400 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80275527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits 利用深度传感器分析多模态演示:硬件，软件和工具包

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2830605

C. W. Leong, L. Chen, G. Feng, Chong Min Lee, Matthew David Mulholland

{"title":"Utilizing Depth Sensors for Analyzing Multimodal Presentations: Hardware, Software and Toolkits","authors":"C. W. Leong, L. Chen, G. Feng, Chong Min Lee, Matthew David Mulholland","doi":"10.1145/2818346.2830605","DOIUrl":"https://doi.org/10.1145/2818346.2830605","url":null,"abstract":"Body language plays an important role in learning processes and communication. For example, communication research produced evidence that mathematical knowledge can be embodied in gestures made by teachers and students. Likewise, body postures and gestures are also utilized by speakers in oral presentations to convey ideas and important messages. Consequently, capturing and analyzing non-verbal behaviors is an important aspect in multimodal learning analytics (MLA) research. With regard to sensing capabilities, the introduction of depth sensors such as the Microsoft Kinect has greatly facilitated research and development in this area. However, the rapid advancement in hardware and software capabilities is not always in sync with the expanding set of features reported in the literature. For example, though Anvil is a widely used state-of-the-art annotation and visualization toolkit for motion traces, its motion recording component based on OpenNI is outdated. As part of our research in developing multimodal educational assessments, we began an effort to develop and standardize algorithms for purposes of multimodal feature extraction and creating automated scoring models. This paper provides an overview of relevant work in multimodal research on educational tasks, and proceeds to summarize our work using multimodal sensors in developing assessments of communication skills, with attention on the use of depth sensors. Specifically, we focus on the task of public speaking assessment using Microsoft Kinect. Additionally, we introduce an open-source Python package for computing expressive body language features from Kinect motion data, which we hope will benefit the MLA research community.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83451695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Deception Detection using Real-life Trial Data 使用真实试验数据进行欺骗检测

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction Pub Date : 2015-11-09 DOI: 10.1145/2818346.2820758

Verónica Pérez-Rosas, M. Abouelenien, Rada Mihalcea, Mihai Burzo

引用次数: 167