{"title":"Session details: Oral Session 4: Nonverbal Behaviors","authors":"B. Sankur","doi":"10.1145/3246747","DOIUrl":"https://doi.org/10.1145/3246747","url":null,"abstract":"","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130153568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Emotion Recognition in Real-world Conditions with Acoustic and Visual Features","authors":"M. Sidorov, W. Minker","doi":"10.1145/2663204.2666279","DOIUrl":"https://doi.org/10.1145/2663204.2666279","url":null,"abstract":"There is an enormous number of potential applications of the system which is capable to recognize human emotions. Such opportunity can be useful in various applications, e.g., improvement of Spoken Dialogue Systems (SDSs) or monitoring agents in call-centers. Therefore, the Emotion Recognition In The Wild Challenge 2014 (EmotiW 2014) is focused on estimating emotions in real-world situations. This study presents the results of multimodal emotion recognition based on support vector classifier. The described approach results in 41.77% of overall classification accuracy in the multimodal case. The obtained result is more than 17% higher than the baseline result for multimodal approach.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128994298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaohua Huang, Qiuhai He, Xiaopeng Hong, Guoying Zhao, M. Pietikäinen
{"title":"Improved Spatiotemporal Local Monogenic Binary Pattern for Emotion Recognition in The Wild","authors":"Xiaohua Huang, Qiuhai He, Xiaopeng Hong, Guoying Zhao, M. Pietikäinen","doi":"10.1145/2663204.2666278","DOIUrl":"https://doi.org/10.1145/2663204.2666278","url":null,"abstract":"Local binary pattern from three orthogonal planes (LBP-TOP) has been widely used in emotion recognition in the wild. However, it suffers from illumination and pose changes. This paper mainly focuses on the robustness of LBP-TOP to unconstrained environment. Recent proposed method, spatiotemporal local monogenic binary pattern (STLMBP), was verified to work promisingly in different illumination conditions. Thus this paper proposes an improved spatiotemporal feature descriptor based on STLMBP. The improved descriptor uses not only magnitude and orientation, but also the phase information, which provide complementary information. In detail, the magnitude, orientation and phase images are obtained by using an effective monogenic filter, and multiple feature vectors are finally fused by multiple kernel learning. STLMBP and the proposed method are evaluated in the Acted Facial Expression in the Wild as part of the 2014 Emotion Recognition in the Wild Challenge. They achieve competitive results, with an accuracy gain of 6.35% and 7.65% above the challenge baseline (LBP-TOP) over video.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126758350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tactile Feedback for Above-Device Gesture Interfaces: Adding Touch to Touchless Interactions","authors":"Euan Freeman, S. Brewster, V. Lantz","doi":"10.1145/2663204.2663280","DOIUrl":"https://doi.org/10.1145/2663204.2663280","url":null,"abstract":"Above-device gesture interfaces let people interact in the space above mobile devices using hand and finger movements. For example, users could gesture over a mobile phone or wearable without having to use the touchscreen. We look at how above-device interfaces can also give feedback in the space over the device. Recent haptic and wearable technologies give new ways to provide tactile feedback while gesturing, letting touchless gesture interfaces give touch feedback. In this paper we take a first detailed look at how tactile feedback can be given during above-device interaction. We compare approaches for giving feedback (ultrasound haptics, wearables and direct feedback) and also look at feedback design. Our findings show that tactile feedback can enhance above-device gesture interfaces.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126691151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix Schüssel, F. Honold, Miriam Schmidt, N. Bubalo, A. Huckauf, M. Weber
{"title":"Multimodal Interaction History and its use in Error Detection and Recovery","authors":"Felix Schüssel, F. Honold, Miriam Schmidt, N. Bubalo, A. Huckauf, M. Weber","doi":"10.1145/2663204.2663255","DOIUrl":"https://doi.org/10.1145/2663204.2663255","url":null,"abstract":"Multimodal systems still tend to ignore the individual input behavior of users, and at the same time, suffer from erroneous sensor inputs. Although many researchers have described user behavior in specific settings and tasks, little to nothing is known about the applicability of such information, when it comes to increase the robustness of a system for multimodal inputs. We conducted a gamified experimental study to investigate individual user behavior and error types found in an actually running system. It is shown, that previous ways of describing input behavior by a simple classification scheme (like simultaneous and sequential) are not suited to build up an individual interaction history. Instead, we propose to use temporal distributions of different metrics derived from multimodal event timings. We identify the major errors that can occur in multimodal interactions and finally show how such an interaction history can practically be applied for error detection and recovery. Applying the proposed approach to the experimental data, the initial error rate is reduced from 4.9% to a minimum of 1.2%.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131841681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alvaro Marcos-Ramiro, Daniel Pizarro-Perez, Marta Marrón Romera, D. Gática-Pérez
{"title":"Capturing Upper Body Motion in Conversation: An Appearance Quasi-Invariant Approach","authors":"Alvaro Marcos-Ramiro, Daniel Pizarro-Perez, Marta Marrón Romera, D. Gática-Pérez","doi":"10.1145/2663204.2663267","DOIUrl":"https://doi.org/10.1145/2663204.2663267","url":null,"abstract":"We address the problem of body communication retrieval and measuring in seated conversations by means of markerless motion capture. In psychological studies, the use of automatic methods is key to reduce the subjectivity present in manual behavioral coding used to extract these cues. These studies usually involve hundreds of subjects with different clothing, non-acted poses, or different distances to the camera in uncalibrated, RGB-only video. However, range cameras are not yet common in psychology research, especially in existing recordings. Therefore, it becomes highly relevant to develop a fast method that is able to work in these conditions. Given the known relationship between depth and motion estimates, we propose to robustly integrate highly appearance-invariant image motion features in a machine learning approach, complemented with an effective tracking scheme. We evaluate the method's performance with existing databases and a database of upper body poses displayed in job interviews that we make public, showing that in our scenario it is comparable to that of Kinect without using a range camera, and state-of-the-art w.r.t. the HumanEva and ChaLearn 2011 evaluation datasets.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133424104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Detection of Naturalistic Hand-over-Face Gesture Descriptors","authors":"M. Mahmoud, T. Baltrušaitis, P. Robinson","doi":"10.1145/2663204.2663258","DOIUrl":"https://doi.org/10.1145/2663204.2663258","url":null,"abstract":"One of the main factors that limit the accuracy of facial analysis systems is hand occlusion. As the face becomes occluded, facial features are either lost, corrupted or erroneously detected. Hand-over-face occlusions are considered not only very common but also very challenging to handle. Moreover, there is empirical evidence that some of these hand-over-face gestures serve as cues for recognition of cognitive mental states. In this paper, we detect hand-over-face occlusions and classify hand-over-face gesture descriptors in videos of natural expressions using multi-modal fusion of different state-of-the-art spatial and spatio-temporal features. We show experimentally that we can successfully detect face occlusions with an accuracy of 83%. We also demonstrate that we can classify gesture descriptors (hand shape, hand action and facial region occluded) significantly higher than a naive baseline. To our knowledge, this work is the first attempt to automatically detect and classify hand-over-face gestures in natural expressions.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133048561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhinav Dhall, Roland Göcke, Jyoti Joshi, Karan Sikka, Tom Gedeon
{"title":"Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol","authors":"Abhinav Dhall, Roland Göcke, Jyoti Joshi, Karan Sikka, Tom Gedeon","doi":"10.1145/2663204.2666275","DOIUrl":"https://doi.org/10.1145/2663204.2666275","url":null,"abstract":"The Second Emotion Recognition In The Wild Challenge (EmotiW) 2014 consists of an audio-video based emotion classification challenge, which mimics the real-world conditions. Traditionally, emotion recognition has been performed on data captured in constrained lab-controlled like environment. While this data was a good starting point, such lab controlled data poorly represents the environment and conditions faced in real-world situations. With the exponential increase in the number of video clips being uploaded online, it is worthwhile to explore the performance of emotion recognition methods that work `in the wild'. The goal of this Grand Challenge is to carry forward the common platform defined during EmotiW 2013, for evaluation of emotion recognition methods in real-world conditions. The database in the 2014 challenge is the Acted Facial Expression In Wild (AFEW) 4.0, which has been collected from movies showing close-to-real-world conditions. The paper describes the data partitions, the baseline method and the experimental protocol.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115039344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session 5: Mobile and Urban Interaction","authors":"M. Sezgin","doi":"10.1145/3246750","DOIUrl":"https://doi.org/10.1145/3246750","url":null,"abstract":"","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123954088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Realizing Robust Human-Robot Interaction under Real Environments with Noises","authors":"Takaaki Sugiyama","doi":"10.1145/2663204.2666283","DOIUrl":"https://doi.org/10.1145/2663204.2666283","url":null,"abstract":"A human speaker considers her interlocutor's situation when she determines to begin speaking in human-human interaction. We assume this tendency is also applicable to human-robot interaction when a human treats a humanoid robot as a social being and behaves as a cooperative user. As a part of this social norm, we have built a model of predicting when a user is likely to begin speaking to a humanoid robot. This proposed model can be used to prevent a robot from generating erroneous reactions by ignoring input noises. In my Ph.D. thesis, we will realize robust human-robot interaction under real environments with noises. To achieve this, we began constructing a robot dialogue system using multiple modalities, such as audio and visual, and the robot's posture information. We plan to: 1) construct a robot dialogue system, 2) develop systems using social norms, such as an input sound classifier, controlling user's untimely utterances, and estimating user's degree of urgency, and 3) extend it from a one-to-one dialogue system to a multi-party one.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123272900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}