Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献_第8页

Joint Discrete and Continuous Emotion Prediction Using Ensemble and End-to-End Approaches 基于集成和端到端方法的联合离散和连续情绪预测

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242972

Ehab Albadawy, Yelin Kim

{"title":"Joint Discrete and Continuous Emotion Prediction Using Ensemble and End-to-End Approaches","authors":"Ehab Albadawy, Yelin Kim","doi":"10.1145/3242969.3242972","DOIUrl":"https://doi.org/10.1145/3242969.3242972","url":null,"abstract":"This paper presents a novel approach in continuous emotion prediction that characterizes dimensional emotion labels jointly with continuous and discretized representations. Continuous emotion labels can capture subtle emotion variations, but their inherent noise often has negative effects on model training. Recent approaches found a performance gain when converting the continuous labels into a discrete set (e.g., using k-means clustering), despite a label quantization error. To find the optimal trade-off between the continuous and discretized emotion representations, we investigate two joint modeling approaches: ensemble and end-to-end. The ensemble model combines the predictions from two models that are trained separately, one with discretized prediction and the other with continuous prediction. On the other hand, the end-to-end model is trained to simultaneously optimize both discretized and continuous prediction tasks in addition to the final combination between them. Our experimental results using the state-of-the-art deep BLSTM network on the RECOLA dataset demonstrate that (i) the joint representation outperforms both individual representation baselines and the state-of-the-art speech based results on RECOLA, validating the assumption that combining continuous and discretized emotion representations yields better performance in emotion prediction; and (ii) the joint representation can help to accelerate convergence, particularly for valence prediction. Our work provides insights into joint discrete and continuous emotion representation and its efficacy for describing dynamically changing affective behavior in valence and activation prediction.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132751472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Multimodal-Sensor-Enabled Room for Unobtrusive Group Meeting Analysis 一个多模态传感器支持的会议室，用于不引人注目的小组会议分析

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243022

Indrani Bhattacharya, Michael Foley, Ni Zhang, Tongtao Zhang, Christine Ku, Cameron Mine, Heng Ji, Christoph Riedl, B. F. Welles, R. Radke

{"title":"A Multimodal-Sensor-Enabled Room for Unobtrusive Group Meeting Analysis","authors":"Indrani Bhattacharya, Michael Foley, Ni Zhang, Tongtao Zhang, Christine Ku, Cameron Mine, Heng Ji, Christoph Riedl, B. F. Welles, R. Radke","doi":"10.1145/3242969.3243022","DOIUrl":"https://doi.org/10.1145/3242969.3243022","url":null,"abstract":"Group meetings can suffer from serious problems that undermine performance, including bias, \"groupthink\", fear of speaking, and unfocused discussion. To better understand these issues, propose interventions, and thus improve team performance, we need to study human dynamics in group meetings. However, this process currently heavily depends on manual coding and video cameras. Manual coding is tedious, inaccurate, and subjective, while active video cameras can affect the natural behavior of meeting participants. Here, we present a smart meeting room that combines microphones and unobtrusive ceiling-mounted Time-of-Flight (ToF) sensors to understand group dynamics in team meetings. We automatically process the multimodal sensor outputs with signal, image, and natural language processing algorithms to estimate participant head pose, visual focus of attention (VFOA), non-verbal speech patterns, and discussion content. We derive metrics from these automatic estimates and correlate them with user-reported rankings of emergent group leaders and major contributors to produce accurate predictors. We validate our algorithms and report results on a new dataset of lunar survival tasks of 36 individuals across 10 groups collected in the multimodal-sensor-enabled smart room.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132137976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Reinforcing, Reassuring, and Roasting: The Forms and Functions of the Human Smile 强化、安慰与烘烤:人类微笑的形式与功能

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243393

P. Niedenthal

{"title":"Reinforcing, Reassuring, and Roasting: The Forms and Functions of the Human Smile","authors":"P. Niedenthal","doi":"10.1145/3242969.3243393","DOIUrl":"https://doi.org/10.1145/3242969.3243393","url":null,"abstract":"What are facial expressions for? In social-functional accounts, they are efficient adaptations that are used flexibly to address the problems inherent to successful social living. Facial expressions both broadcast emotions and regulate the emotions of perceivers. Research from my laboratory focuses on the human smile and demonstrates how this very nuanced display varies in its physical form in order to solve three basic social challenges: rewarding others, signaling non-threat, and negotiating social hierarchies. We mathematically modeled the dynamic facial-expression patterns of reward, affiliation, and dominance smiles using a data-driven approach that combined a dynamic facial expression generator with methods of reverse correlation. The resulting models were validated using human-perceiver and Bayesian classifiers. Human smile stimuli were also developed and validated in studies in which distinct effects of the smiles on physiological and hormonal processes were observed. The social-function account is extended to the acoustic form of laughter and is used to address questions about cross-cultural differences in emotional expression.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132370283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Keep Me in the Loop: Increasing Operator Situation Awareness through a Conversational Multimodal Interface 让我在循环中:通过会话式多模式界面增加操作员的情况意识

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242974

D. A. Robb, Javier Chiyah-Garcia, A. Laskov, Xingkun Liu, P. Patrón, H. Hastie

引用次数: 16

Enhancing Multiparty Cooperative Movements: A Robotic Wheelchair that Assists in Predicting Next Actions 加强多方合作运动:一个机器人轮椅，协助预测下一步行动

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242983

Hisato Fukuda, K. Yamazaki, Akiko Yamazaki, Yosuke Saito, Emi Iiyama, Seiji Yamazaki, Yoshinori Kobayashi, Y. Kuno, Keiko Ikeda

{"title":"Enhancing Multiparty Cooperative Movements: A Robotic Wheelchair that Assists in Predicting Next Actions","authors":"Hisato Fukuda, K. Yamazaki, Akiko Yamazaki, Yosuke Saito, Emi Iiyama, Seiji Yamazaki, Yoshinori Kobayashi, Y. Kuno, Keiko Ikeda","doi":"10.1145/3242969.3242983","DOIUrl":"https://doi.org/10.1145/3242969.3242983","url":null,"abstract":"When an automatic wheelchair or a self-carrying robot moves along with human agents, prediction for the next possible actions by the participating agents, play an important role in realization of successful cooperation among them. In this paper, we mounted a robot to a wheelchair body so that it provides embodied projective signals to the human agents, indicating the next possible action to be performed by the wheelchair. We have analyzed how human participants, particularly when they are in a multiparty interaction, would respond to such a system in experiments. We designed two settings for the robot's projective behavior. The first design allows the robot to face towards the human agents (Face-to-Face model), and the other allows it to face forward as the human agents do, then turn around to the human agents when it indicates where the wheelchair will move to (Body-Torque model). The analysis examined reactions by the human agents to the wheelchair, his/her accompanying human agent, and others who pass by them in the experiment's setting. The results show that the Body-Torque model seems more effective in enhancing cooperative behavior among the human participants than the Face-to-Face model when they are moving to a forward direction together.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114415093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Multimodal Interaction Modeling of Child Forensic Interviewing 儿童法医访谈的多模态交互建模

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243006

V. Ardulov, Madelyn Mendlen, Manoj Kumar, Neha Anand, Shanna Williams, T. Lyon, Shrikanth S. Narayanan

{"title":"Multimodal Interaction Modeling of Child Forensic Interviewing","authors":"V. Ardulov, Madelyn Mendlen, Manoj Kumar, Neha Anand, Shanna Williams, T. Lyon, Shrikanth S. Narayanan","doi":"10.1145/3242969.3243006","DOIUrl":"https://doi.org/10.1145/3242969.3243006","url":null,"abstract":"Constructing computational models of interactions during Forensic Interviews (FI) with children presents a unique challenge in being able to maximize complete and accurate information disclosure, while minimizing emotional trauma experienced by the child. Leveraging multiple channels of observational signals, dynamical system modeling is employed to track and identify patterns in the influence interviewers' linguistic and paralinguistic behavior has on children's verbal recall productivity. Specifically, linear mixed effects modeling and dynamical mode decomposition allow for robust analysis of acoustic-prosodic features, aligned with lexical features at turn-level utterances. By varying the window length, the model parameters evaluate both interviewer and child behaviors at different temporal resolutions, thus capturing both rapport-building and disclosure phases of FI. Making use of a recently proposed definition of productivity, the dynamic systems modeling provides insight into the characteristics of interaction that are most relevant to effectively eliciting narrative and task-relevant information from a child.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115375604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Functional-Based Acoustic Group Feature Selection for Automatic Recognition of Eating Condition 基于功能的进食状态自动识别声学群特征选择

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243682

Dara Pir

{"title":"Functional-Based Acoustic Group Feature Selection for Automatic Recognition of Eating Condition","authors":"Dara Pir","doi":"10.1145/3242969.3243682","DOIUrl":"https://doi.org/10.1145/3242969.3243682","url":null,"abstract":"This paper presents the novel Functional-based acoustic Group Feature Selection (FGFS) method for automatic eating condition recognition addressed in the ICMI 2018 Eating Analysis and Tracking Challenge's Food-type Sub-Challenge. The Food-type Sub-Challenge employs the audiovisual iHEARu-EAT database and attempts to classify which of six food types, or none, is being consumed by subjects while speaking. The approach proposed by the FGFS method uses the audio mode and considers the acoustic feature space in groups rather than individually. Each group is comprised of acoustic features generated by the application of a statistical functional to a specified set of the low-level descriptors of the audio data. The FGFS method provides information about the degree of relevance of the statistical functionals to the task. In addition, the partitioning of features into groups allows for more rapid processing of the official Sub-Challenge's large acoustic feature set. The FGFS-based system achieves 2.8% relative Unweighted Average Recall performance improvement over the official Food-type Sub-Challenge baseline on iHEARu-EAT test data.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115135059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Predicting Engagement Intensity in the Wild Using Temporal Convolutional Network 使用时间卷积网络预测野外接触强度

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264984

Chinchu Thomas, Nitin Nair, D. Jayagopi

{"title":"Predicting Engagement Intensity in the Wild Using Temporal Convolutional Network","authors":"Chinchu Thomas, Nitin Nair, D. Jayagopi","doi":"10.1145/3242969.3264984","DOIUrl":"https://doi.org/10.1145/3242969.3264984","url":null,"abstract":"Engagement is the holy grail of learning whether it is in a classroom setting or an online learning platform. Studies have shown that engagement of the student while learning can benefit students as well as the teacher if the engagement level of the student is known. It is difficult to keep track of the engagement of each student in a face-to-face learning happening in a large classroom. It is even more difficult in an online learning platform where, the user is accessing the material at different instances. Automatic analysis of the engagement of students can help to better understand the state of the student in a classroom setting as well as online learning platforms and is more scalable. In this paper we propose a framework that uses Temporal Convolutional Network (TCN) to understand the intensity of engagement of students attending video material from Massive Open Online Courses (MOOCs). The input to the TCN network is the statistical features computed on 10 second segments of the video from the gaze, head pose and action unit intensities available in OpenFace library. The ability of the TCN architecture to capture long term dependencies gives it the ability to outperform other sequential models like LSTMs. On the given test set in the EmotiW 2018 sub challenge-\"Engagement in the Wild\", the proposed approach with Dilated-TCN achieved an average mean square error of 0.079.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128287506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Automatic Recognition of Affective Laughter in Spontaneous Dyadic Interactions from Audiovisual Signals 基于视听信号的自发二元互动中情感笑声的自动识别

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243012

R. Kantharaju, F. Ringeval, L. Besacier

{"title":"Automatic Recognition of Affective Laughter in Spontaneous Dyadic Interactions from Audiovisual Signals","authors":"R. Kantharaju, F. Ringeval, L. Besacier","doi":"10.1145/3242969.3243012","DOIUrl":"https://doi.org/10.1145/3242969.3243012","url":null,"abstract":"Laughter is a highly spontaneous behavior that frequently occurs during social interactions. It serves as an expressive-communicative social signal which conveys a large spectrum of affect display. Even though many studies have been performed on the automatic recognition of laughter -- or emotion -- from audiovisual signals, very little is known about the automatic recognition of emotion conveyed by laughter. In this contribution, we provide insights on emotional laughter by extensive evaluations carried out on a corpus of dyadic spontaneous interactions, annotated with dimensional labels of emotion (arousal and valence). We evaluate, by automatic recognition experiments and correlation based analysis, how different categories of laughter, such as unvoiced laughter, voiced laughter, speech laughter, and speech (non-laughter) can be differentiated from audiovisual features, and to which extent they might convey different emotions. Results show that voiced laughter performed best in the automatic recognition of arousal and valence for both audio and visual features. The context of production is further analysed and results show that, acted and spontaneous expressions of laughter produced by a same person can be differentiated from audiovisual signals, and multilingual induced expressions can be differentiated from those produced during interactions.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125928630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Data Driven Non-Verbal Behavior Generation for Humanoid Robots 数据驱动的类人机器人非语言行为生成

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264970

Taras Kucherenko

引用次数: 11