{"title":"Assessment of users' interests in multimodal dialog based on exchange unit","authors":"Sayaka Tomimasu, Masahiro Araki","doi":"10.1145/3011263.3011269","DOIUrl":"https://doi.org/10.1145/3011263.3011269","url":null,"abstract":"A person is more likely to enjoy long-term conversations with a robot if it has the capability to infer the topics that interest the person. In this paper, we propose a method of deducing the specific topics that interest a user by sequentially assessing each exchange in a chat-oriented dialog session. We use multimodal information such as facial expressions and prosodic information obtained from the user's utterances for assessing interest as these parameters are independent of linguistic information that varies widely in chat-oriented dialogs. The results show that the accuracy of the assessment of the user's interest is better when we use both features.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"3 3-4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128275180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deictic gestures in coaching interactions","authors":"I. D. Kok, J. Hough, David Schlangen, S. Kopp","doi":"10.1145/3011263.3011267","DOIUrl":"https://doi.org/10.1145/3011263.3011267","url":null,"abstract":"In motor skill coaching interaction coaches use several techniques to improve the motor skill of the coachee. Through goal setting, explanations, instructions and feedback the coachee is motivated and guided to improve the motor skill. These verbal speech actions are often accompanied by iconic or deictic gestures and other nonverbal acts, such as demonstrations. We are building a virtual coach that is capable of the same behaviour. In this paper we have taken a closer look at the form, type and timing of deictic gestures in our corpus of human-human coaching interactions. We show that a significant amount of the deictic gestures actually touch the referred object, that most of the gestures are complimentary (contrary to previous research) and often occur before the lexical affiliate.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129366758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attitude recognition of video bloggers using audio-visual descriptors","authors":"F. Haider, L. Cerrato, S. Luz, N. Campbell","doi":"10.1145/3011263.3011270","DOIUrl":"https://doi.org/10.1145/3011263.3011270","url":null,"abstract":"In social media, vlogs (video blogs) are a form of unidirectional communication, where the vloggers (video bloggers) convey their messages (opinions, thoughts, etc.) to a potential audience which cannot give them feedback in real time. In this kind of communication, the non-verbal behaviour and personality impression of a video blogger tends to influence viewers' attention because non-verbal cues are correlated with the messages conveyed by a vlogger. In this study, we use the acoustic and visual features (body movements that are captured by low-level visual descriptors) to predict the six different attitudes (amusement, enthusiasm, friendliness, frustration, impatience and neutral) annotated in the speech of 10 video bloggers. The automatic detection of attitude can be helpful in a scenario where a machine has to automatically provide feedback to bloggers about their performance in terms of the extent to which they manage to engage the audience by displaying certain attitudes. Attitude recognition models are trained using the random forest classifier. Results show that: 1) acoustic features provide better accuracy than the visual features, 2) while fusion of audio and visual features does not increase overall accuracy, it improves the results for some attitudes and subjects, and 3) densely extracted histograms of flow provide better results than other visual descriptors. A three-class (positive, negative and neutral attitudes) problem has also been defined. Results for this setting show that feature fusion degrades overall classifier accuracy, and the classifiers perform better on the original six-class problem than on the three-class setting.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122902222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Increasing robustness of multimodal interaction via individual interaction histories","authors":"Felix Schüssel, F. Honold, N. Bubalo, M. Weber","doi":"10.1145/3011263.3011273","DOIUrl":"https://doi.org/10.1145/3011263.3011273","url":null,"abstract":"Multimodal input fusion can be considered a well researched topic and yet it is rarely found in real world applications. One reason for this could be the lack of robustness in real world situations, especially regarding unimodal recognition technologies like speech and gesture, that tend to produce erroneous inputs that can not be detected by the subsequent multimodal input fusion mechanism. Previous work implying the possibility to detect and overcome such errors through knowledge of individual temporal behaviors has neither provided a real-time implementation nor evaluated the real benefit of such an approach. We present such an implementation of applying individual interaction histories in order to increase the robustness of multimodal inputs within a smartwatch scenario. We show how such knowledge can be created and maintained at runtime, present evaluation data from an experiment conducted in a realistic scenario, and compare the approach to the state of the art known from literature. Our approach is ready to use in other applications and existing systems, with the prospect to increase the overall robustness of future multimodal systems.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128419500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of gesture frequency and amplitude as a function of personality in virtual agents","authors":"Alex Rayón, Timothy Gonzalez, D. Novick","doi":"10.1145/3011263.3011266","DOIUrl":"https://doi.org/10.1145/3011263.3011266","url":null,"abstract":"Embodied conversational agents are changing the way humans interact with technology. In order to develop humanlike ECAs they need to be able to perform natural gestures that are used in day-to-day conversation. Gestures can give insight into an ECAs personality trait of extraversion, but what factors into it is still being explored. Our study focuses on two aspects of gesture: amplitude and frequency. Our goal is to find out whether agents should use specific gestures more frequently than others depending on the personality type they have been designed with. We also look to quantify gesture amplitude and compare it to a previous study on the perception of an agent's naturalness of its gestures. Our results showed some indication that introverts and extraverts judge the agent's naturalness similarly. The larger the amplitude our agent used, the more natural its gestures were perceived. The frequency of gestures between extraverts and introverts seem to contain hardly any difference, even in terms of types of gesture used.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116867736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ronald Böck, Francesca Bonin, N. Campbell, R. Poppe
{"title":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","authors":"Ronald Böck, Francesca Bonin, N. Campbell, R. Poppe","doi":"10.1145/3011263","DOIUrl":"https://doi.org/10.1145/3011263","url":null,"abstract":"","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128097069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Inoue, Divesh Lala, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara
{"title":"Annotation and analysis of listener's engagement based on multi-modal behaviors","authors":"K. Inoue, Divesh Lala, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara","doi":"10.1145/3011263.3011271","DOIUrl":"https://doi.org/10.1145/3011263.3011271","url":null,"abstract":"We address the annotation of engagement in the context of human-machine interaction. Engagement represents the level of how much a user is being interested in and willing to continue the current interaction. The conversational data used in the annotation work is a human-robot interaction corpus where a human subject talks with the android ERICA, which is remotely operated by another human subject. The annotation work was done by multiple third-party annotators, and the task was to detect the time point when the level of engagement becomes high. The annotation results indicate that there are agreements among the annotators although the numbers of annotated points are different among them. It is also found that the level of engagement is related to turn-taking behaviors. Furthermore, we conducted interviews with the annotators to reveal behaviors used to show a high level of engagement. The results suggest that laughing, backchannels and nodding are related to the level of engagement.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127549476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic annotation of gestural units in spontaneous face-to-face interaction","authors":"Simon Alexanderson, D. House, J. Beskow","doi":"10.1145/3011263.3011268","DOIUrl":"https://doi.org/10.1145/3011263.3011268","url":null,"abstract":"Speech and gesture co-occur in spontaneous dialogue in a highly complex fashion. There is a large variability in the motion that people exhibit during a dialogue, and different kinds of motion occur during different states of the interaction. A wide range of multimodal interface applications, for example in the fields of virtual agents or social robots, can be envisioned where it is important to be able to automatically identify gestures that carry information and discriminate them from other types of motion. While it is easy for a human to distinguish and segment manual gestures from a flow of multimodal information, the same task is not trivial to perform for a machine. In this paper we present a method to automatically segment and label gestural units from a stream of 3D motion capture data. The gestural flow is modeled with a 2-level Hierarchical Hidden Markov Model (HHMM) where the sub-states correspond to gesture phases. The model is trained based on labels of complete gesture units and self-adaptive manipulators. The model is tested and validated on two datasets differing in genre and in method of capturing motion, and outperforms a state-of-the-art SVM classifier on a publicly available dataset.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114748909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Besserer, Johannes Bäurle, Alexander Nikic, F. Honold, Felix Schüssel, M. Weber
{"title":"Fitmirror: a smart mirror for positive affect in everyday user morning routines","authors":"Daniel Besserer, Johannes Bäurle, Alexander Nikic, F. Honold, Felix Schüssel, M. Weber","doi":"10.1145/3011263.3011265","DOIUrl":"https://doi.org/10.1145/3011263.3011265","url":null,"abstract":"This paper will discuss the concept of a smart mirror for healthier living, the FitMirror. Many people have serious problems to get up after sleeping, to get motivated for the day, or are tired and in a bad mood in the morning. The goal of FitMirror is to positively affect the user's feelings by increasing his/her motivation, mood and feeling of fitness. While concepts for these isolated problems exist, none of these combine them into one system. FitMirror is implemented to combine them and evaluate them in a study. It consists of a monitor with spy-foil, a Microsoft Kinect v2 and a Wii Balance Board and can recognize users and their gestures with these elements. Several hypotheses about the system regarding motivation, fun, difficulty and getting awake were investigated. Participants were grouped by the factors sportspersons and morning persons to investigate the effect based on these aspects. Results show that FitMirror can help users get awake in the morning, raise their motivation to do sports and motivate them for the day.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128431675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Body movements and laughter recognition: experiments in first encounter dialogues","authors":"Kristiina Jokinen, Trung Ngo Trong, G. Wilcock","doi":"10.1145/3011263.3011264","DOIUrl":"https://doi.org/10.1145/3011263.3011264","url":null,"abstract":"This paper reports work on automatic analysis of laughter and human body movements in a video corpus of human-human dialogues. We use the Nordic First Encounters video corpus where participants meet each other for the first time. This corpus has manual annotations of participants' head, hand and body movements as well as laughter occurrences. We employ machine learning methods to analyse the corpus using two types of features: visual features that describe bounding boxes around participants' heads and bodies, automatically detecting body movements in the video, and audio speech features based on the participants' spoken contributions. We then correlate the speech and video features and apply neural network techniques to predict if a person is laughing or not given a sequence of video features. The hypothesis is that laughter occurrences and body movement are synchronized, or at least there is a significant relation between laughter activities and occurrences of body movements. Our results confirm the hypothesis of the synchrony of body movements with laughter, but we also emphasise the complexity of the problem and the need for further investigations on the feature sets and the algorithm used.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127932950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}