{"title":"Session details: Oral Session 1: Dialogue and Social Interaction","authors":"T. Nishida","doi":"10.1145/3246741","DOIUrl":"https://doi.org/10.1145/3246741","url":null,"abstract":"","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128396294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild","authors":"Bo Sun, Liandong Li, Tian Zuo, Ying Chen, Guoyan Zhou, Xuewen Wu","doi":"10.1145/2663204.2666272","DOIUrl":"https://doi.org/10.1145/2663204.2666272","url":null,"abstract":"Emotion recognition in the wild is a very challenging task. In this paper, we investigate a variety of different multimodal features from video and audio to evaluate their discriminative ability to human emotion analysis. For each clip, we extract SIFT, LBP-TOP, PHOG, LPQ-TOP and audio features. We train different classifiers for every kind of features on the dataset from EmotiW 2014 Challenge, and we propose a novel hierarchical classifier fusion method for all the extracted features. The final achievement we gained on the test set is 47.17% which is much better than the best baseline recognition rate of 33.7%.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121394935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech-Driven Animation Constrained by Appropriate Discourse Functions","authors":"Najmeh Sadoughi, Yang Liu, C. Busso","doi":"10.1145/2663204.2663252","DOIUrl":"https://doi.org/10.1145/2663204.2663252","url":null,"abstract":"Conversational agents provide powerful opportunities to interact and engage with the users. The challenge is how to create naturalistic behaviors that replicate the complex gestures observed during human interactions. Previous studies have used rule-based frameworks or data-driven models to generate appropriate gestures, which are properly synchronized with the underlying discourse functions. Among these methods, speech-driven approaches are especially appealing given the rich information conveyed on speech. It captures emotional cues and prosodic patterns that are important to synthesize behaviors (i.e., modeling the variability and complexity of the timings of the behaviors). The main limitation of these models is that they fail to capture the underlying semantic and discourse functions of the message (e.g., nodding). This study proposes a speech-driven framework that explicitly model discourse functions, bridging the gap between speech-driven and rule-based models. The approach is based on dynamic Bayesian Network (DBN), where an additional node is introduced to constrain the models by specific discourse functions. We implement the approach by synthesizing head and eyebrow motion. We conduct perceptual evaluations to compare the animations generated using the constrained and unconstrained models.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131172284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital Reading Support for The Blind by Multimodal Interaction","authors":"Yasmine N. El-Glaly, Francis K. H. Quek","doi":"10.1145/2663204.2663266","DOIUrl":"https://doi.org/10.1145/2663204.2663266","url":null,"abstract":"Slate-type devices allow Individuals with Blindness or Severe Visual Impairment (IBSVI) to read in place with the touch of their fingertip by audio-rendering the words they touch. Such technologies are helpful for spatial cognition while reading. However, users have to move their fingers slowly or they may lose place on screen. Also, IBSVI may wander between lines without realizing they did. In this paper, we address these two interaction problems by introducing dynamic speech-touch interaction model, and intelligent reading support system. With this model, the speed of the speech will dynamically change coping up with the user's finger speed. The proposed model is composed of: 1- Audio Dynamics Model, and 2- Off-line Speech Synthesis Technique. The intelligent reading support system predicts the direction of reading, corrects the reading word if the user drifts, and notifies the user using a sonic gutter to help her from straying off the reading line. We tested the new audio dynamics model, the sonic gutter, and the reading support model in two user studies. The participants' feedback helped us fine-tune the parameters of the two models. Finally, we ran an evaluation study where the reading support system is compared to other VoiceOver technologies. The results showed preponderance to the reading support system with its audio dynamics and intelligent reading support components.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132646160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ICMI 2014 Workshop on Multimodal, Multi-Party, Real-World Human-Robot Interaction","authors":"M. Foster, M. Giuliani, Ronald P. A. Petrick","doi":"10.1145/2663204.2668319","DOIUrl":"https://doi.org/10.1145/2663204.2668319","url":null,"abstract":"The Workshop on Multimodal, Multi-Party, Real-World Human-Robot Interaction will be held in Istanbul on 16 November 2014, co-located with the 16th International Conference on Multimodal Interaction (ICMI 2014). The workshop objective is to address the challenges that robots face when interacting with humans in real-world scenarios. The workshop brings together researchers from intention and activity recognition, person tracking, robust speech recognition and language processing, multimodal fusion, planning and decision making under uncertainty, and service robot design. The programme consists of two invited talks, three long paper talks, and seven late-breaking abstracts. Information on the workshop and pointers to workshop papers and slides can be found at http://www.macs.hw.ac.uk/~mef3/icmi-2014-workshop-hri/.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"421 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131593887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Pejsa, D. Bohus, Michael F. Cohen, C. Saw, James Mahoney, E. Horvitz
{"title":"Natural Communication about Uncertainties in Situated Interaction","authors":"T. Pejsa, D. Bohus, Michael F. Cohen, C. Saw, James Mahoney, E. Horvitz","doi":"10.1145/2663204.2663249","DOIUrl":"https://doi.org/10.1145/2663204.2663249","url":null,"abstract":"Physically situated, multimodal interactive systems must often grapple with uncertainties about properties of the world, people, and their intentions and actions. We present methods for estimating and communicating about different uncertainties in situated interaction, leveraging the affordances of an embodied conversational agent. The approach harnesses a representation that captures both the magnitude and the sources of uncertainty, and a set of policies that select and coordinate the production of nonverbal and verbal behaviors to communicate the system's uncertainties to conversational participants. The methods are designed to enlist participants' help in a natural manner to resolve uncertainties arising during interactions. We report on a preliminary implementation of the proposed methods in a deployed system and illustrate the functionality with a trace from a sample interaction.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134518702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perceptions of Interpersonal Behavior are Influenced by Gender, Facial Expression Intensity, and Head Pose","authors":"J. Girard","doi":"10.1145/2663204.2667575","DOIUrl":"https://doi.org/10.1145/2663204.2667575","url":null,"abstract":"Across multiple channels, nonverbal behavior communicates information about affective states and interpersonal intentions. Researchers interested in understanding how these nonverbal messages are transmitted and interpreted have examined the relationship between behavior and ratings of interpersonal motives using dimensions such as agency and communion. However, previous work has focused on images of posed behavior and it is unclear how well these results will generalize to more dynamic representations of real-world behavior. The current study proposes to extend the current literature by examining how gender, facial expression intensity, and head pose influence interpersonal ratings in videos of spontaneous nonverbal behavior.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132903185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Orchestration for Group Videoconferencing: An Interactive Demonstrator","authors":"Wolfgang Weiss, Rene Kaiser, Manolis Falelakis","doi":"10.1145/2663204.2669624","DOIUrl":"https://doi.org/10.1145/2663204.2669624","url":null,"abstract":"In this demonstration we invite visitors to join a live videoconferencing session with remote participants across Europe. We demonstrate the behavior of an automatic decision making component in the realm of social video communication. Our approach takes into account several aspects such as the current conversational situation, conversational metrics of the past, and device capabilities, to make decisions on the visual representation of available video streams. The combination of these cues and the application of automatic decision making rules results into commands of how to mix and how to compose the available video streams for each conversation node's screen. The demo's features are another step towards optimally supporting users in communication within various communication contexts and adapting the user interface to the users' needs.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134068466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Keynote Address 3","authors":"O. Aran","doi":"10.1145/3246749","DOIUrl":"https://doi.org/10.1145/3246749","url":null,"abstract":"","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130280200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Social Touch Intelligence: Developing a Robust System for Automatic Touch Recognition","authors":"Merel M. Jung","doi":"10.1145/2663204.2666281","DOIUrl":"https://doi.org/10.1145/2663204.2666281","url":null,"abstract":"Touch behavior is of great importance during social interaction. Automatic recognition of social touch is necessary to transfer the touch modality from interpersonal interaction to other areas such as Human-Robot Interaction (HRI). This paper describes a PhD research program on the automatic detection, classification and interpretation of touch in social interaction between humans and artifacts. Progress thus far includes the recording of a Corpus of Social Touch (CoST) consisting of pressure sensor data of 14 different touch gestures and first classification results. Classification of these 14 gestures resulted in an overall accuracy of 53% using Bayesian classifiers. Further work includes the enhancement of the gesture recognition, building an embodied system for real-time classification and testing this system in a possible application scenario.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130704032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}