{"title":"Session details: Demo Session 1","authors":"K. Otsuka, L. Akarun","doi":"10.1145/3246743","DOIUrl":"https://doi.org/10.1145/3246743","url":null,"abstract":"","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122432267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Impact of Changing Communication Practices","authors":"Ailbhe N. Finnerty","doi":"10.1145/2663204.2666285","DOIUrl":"https://doi.org/10.1145/2663204.2666285","url":null,"abstract":"Due to advancements in communication technologies, how we interact with each other has changed significantly. An advantage is being able to keep in touch (family, friends) and collaborate (colleagues) with others over large distances. However, these technologies can remove behavioural cues, such as, changes in tone, gesturing and posture which can add depth and meaning to an interaction. In this paper two studies are presented which investigate changing communication practices in 1) the workplace and in 2) a loosely connected social group. The interactions of the participants were analysed by comparing synchronous (occurring in real time; e.g. face to face) and asynchronous (delayed; email, sms) patterns of communication. The findings showed a prevalence of asynchronous methods of communication in Study 1, which had an impact on affective states (positive and negative) and on self reported measures of productivity, creativity, while in Study 2 synchronous communication patterns affected stress.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117324234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Ringeval, S. Amiriparian, F. Eyben, K. Scherer, Björn Schuller
{"title":"Emotion Recognition in the Wild: Incorporating Voice and Lip Activity in Multimodal Decision-Level Fusion","authors":"F. Ringeval, S. Amiriparian, F. Eyben, K. Scherer, Björn Schuller","doi":"10.1145/2663204.2666271","DOIUrl":"https://doi.org/10.1145/2663204.2666271","url":null,"abstract":"In this paper, we investigate the relevance of using voice and lip activity to improve performance of audiovisual emotion recognition in unconstrained settings, as part of the 2014 Emotion Recognition in the Wild Challenge (EmotiW14). Indeed, the dataset provided by the organisers contains movie excerpts with highly challenging variability in terms of audiovisual content; e.g., speech and/or face of the subject expressing the emotion can be absent in the data. We therefore propose to tackle this issue by incorporating both voice and lip activity as additional features in a decision-level fusion. Results obtained on the blind test set show that the decision-level fusion can improve the best mono-modal approach, and that the addition of both voice and lip activity in the feature set leads to the best performance (UAR=35.27%), with an absolute improvement of 5.36% over the baseline.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115313637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Temitayo A. Olugbade, M. Aung, N. Bianchi-Berthouze, Nicolai Marquardt, A. Williams
{"title":"Bi-Modal Detection of Painful Reaching for Chronic Pain Rehabilitation Systems","authors":"Temitayo A. Olugbade, M. Aung, N. Bianchi-Berthouze, Nicolai Marquardt, A. Williams","doi":"10.1145/2663204.2663261","DOIUrl":"https://doi.org/10.1145/2663204.2663261","url":null,"abstract":"Physical activity is essential in chronic pain rehabilitation. However, anxiety due to pain or a perceived exacerbation of pain causes people to guard against beneficial exercise. Interactive rehabiliation technology sensitive to such behaviour could provide feedback to overcome such psychological barriers. To this end, we developed a Support Vector Machine framework with the feature level fusion of body motion and muscle activity descriptors to discriminate three levels of pain (none, low and high). All subjects underwent a forward reaching exercise which is typically feared among people with chronic back pain. The levels of pain were categorized from control subjects (no pain) and thresholded self reported levels from people with chronic pain. Salient features were identified using a backward feature selection process. Using feature sets from each modality separately led to high pain classification F1 scores of 0.63 and 0.69 for movement and muscle activity respectively. However using a combined bimodal feature set this increased to F1 = 0.8.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130724960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Supporting Non-linear Navigation in Educational Videos","authors":"Kuldeep Yadav, K. Srivastava, Om Deshmukh","doi":"10.1145/2663204.2669630","DOIUrl":"https://doi.org/10.1145/2663204.2669630","url":null,"abstract":"MOOC participants spend most of their time watching videos in a course and recent studies have found that there is a requirement of non-linear navigation system for educational videos. We propose a system that provides efficient and non-linear navigation in a given video using multi-modal dimensions (i.e. customized word-cloud, image cloud) derived from video content. The end-to-end system is implemented and we demonstrate capabilities of proposed system using any given YouTube or MOOC video.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130755707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gesture Heatmaps: Understanding Gesture Performance with Colorful Visualizations","authors":"Radu-Daniel Vatavu, Lisa Anthony, J. Wobbrock","doi":"10.1145/2663204.2663256","DOIUrl":"https://doi.org/10.1145/2663204.2663256","url":null,"abstract":"We introduce gesture heatmaps, a novel gesture analysis technique that employs color maps to visualize the variation of local features along the gesture path. Beyond current gesture analysis practices that characterize gesture articulations with single-value descriptors, e.g., size, path length, or speed, gesture heatmaps are able to show with colorful visualizations how the value of any such descriptors vary along the gesture path. We evaluate gesture heatmaps on three public datasets comprising 15,840 gesture samples of 70 gesture types from 45 participants, on which we demonstrate heatmaps' capabilities to (1) explain causes for recognition errors, (2) characterize users' gesture articulation patterns under various conditions, e.g., finger versus pen gestures, and (3) help understand users' subjective perceptions of gesture commands, such as why some gestures are perceived easier to execute than others. We also introduce chromatic confusion matrices that employ gesture heatmaps to extend the expressiveness of standard confusion matrices to better understand gesture classification performance. We believe that gesture heatmaps will prove useful to researchers and practitioners doing gesture analysis, and consequently, they will inform the design of better gesture sets and development of more accurate recognizers.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128384220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Domain Adaptation for Personalized Facial Emotion Recognition","authors":"Gloria Zen, E. Sangineto, E. Ricci, N. Sebe","doi":"10.1145/2663204.2663247","DOIUrl":"https://doi.org/10.1145/2663204.2663247","url":null,"abstract":"The way in which human beings express emotions depends on their specific personality and cultural background. As a consequence, person independent facial expression classifiers usually fail to accurately recognize emotions which vary between different individuals. On the other hand, training a person-specific classifier for each new user is a time consuming activity which involves collecting hundreds of labeled samples. In this paper we present a personalization approach in which only unlabeled target-specific data are required. The method is based on our previous paper [20] in which a regression framework is proposed to learn the relation between the user's specific sample distribution and the parameters of her/his classifier. Once this relation is learned, a target classifier can be constructed using only the new user's sample distribution to transfer the personalized parameters. The novelty of this paper with respect to [20] is the introduction of a new method to represent the source sample distribution based on using only the Support Vectors of the source classifiers. Moreover, we present here a simplified regression framework which achieves the same or even slightly superior experimental results with respect to [20] but it is much easier to reproduce.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128458451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computation of Emotions","authors":"P. Robinson","doi":"10.1145/2663204.2669638","DOIUrl":"https://doi.org/10.1145/2663204.2669638","url":null,"abstract":"When people talk to each other, they express their feelings through facial expressions, tone of voice, body postures and gestures. They even do this when they are interacting with machines. These hidden signals are an important part of human communication, but most computer systems ignore them. Emotions need to be considered as an important mode of communication between people and interactive systems. Affective computing has enjoyed considerable success over the past 20 years, but many challenges remain.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125258412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dilek Z. Hakkani-Tür, M. Slaney, Asli Celikyilmaz, Larry Heck
{"title":"Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions","authors":"Dilek Z. Hakkani-Tür, M. Slaney, Asli Celikyilmaz, Larry Heck","doi":"10.1145/2663204.2663277","DOIUrl":"https://doi.org/10.1145/2663204.2663277","url":null,"abstract":"When humans converse with each other, they naturally amalgamate information from multiple modalities (i.e., speech, gestures, speech prosody, facial expressions, and eye gaze). This paper focuses on eye gaze and its combination with speech. We develop a model that resolves references to visual (screen) elements in a conversational web browsing system. The system detects eye gaze, recognizes speech, and then interprets the user's browsing intent (e.g., click on a specific element) through a combination of spoken language understanding and eye gaze tracking. We experiment with multi-turn interactions collected in a wizard-of-Oz scenario where users are asked to perform several web-browsing tasks. We compare several gaze features and evaluate their effectiveness when combined with speech-based lexical features. The resulting multi-modal system not only increases user intent (turn) accuracy by 17%, but also resolves the referring expression ambiguity commonly observed in dialog systems with a 10% increase in F-measure.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114170310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alvaro Marcos-Ramiro, Daniel Pizarro-Perez, Marta Marrón Romera, D. Gática-Pérez
{"title":"Automatic Blinking Detection towards Stress Discovery","authors":"Alvaro Marcos-Ramiro, Daniel Pizarro-Perez, Marta Marrón Romera, D. Gática-Pérez","doi":"10.1145/2663204.2663239","DOIUrl":"https://doi.org/10.1145/2663204.2663239","url":null,"abstract":"We present a robust method to automatically detect blinks in video sequences of conversations, aimed to discovering stress. Psychological studies have shown a relationship between blink frequency and dopamine levels, which in turn are affected by stress. Task performance correlates through an inverted U shape to both dopamine and stress levels. This shows the importance of automatic blink detection as a way of reducing human coding burden. We use an off-the-shelf face tracker in order to extract the eye region. Then, we perform per-pixel classification of the extracted eye images to later identify blinks through their dynamics. We evaluate the performance of our system with a job interview database with annotations of psychological variables, and show statistically significant correlation between perceived stress resistance and the automatically detected blink patterns.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"53 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124300545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}