{"title":"User Independent Gaze Estimation by Exploiting Similarity Measures in the Eye Pair Appearance Eigenspace","authors":"Nanxiang Li, C. Busso","doi":"10.1145/2663204.2663250","DOIUrl":"https://doi.org/10.1145/2663204.2663250","url":null,"abstract":"The design of gaze-based computer interfaces has been an active research area for over 40 years. One challenge of using gaze detectors is the repetitive calibration process required to adjust the parameters of the systems, and the constrained conditions imposed on the user for robust gaze estimation. We envision user-independent gaze detectors that do not require calibration, or any cooperation from the user. Toward this goal, we investigate an appearance-based approach, where we estimate the eigenspace for the gaze using principal component analysis (PCA). The projections are used as features of regression models that estimate the screen's coordinates. As expected, the performance of the approach decreases when the models are trained without data from the target user (i.e., user-independent condition). This study proposes an appealing training approach to bridge the gap in performance between user-dependent and user-independent conditions. Using the projections onto the eigenspace, the scheme identifies samples in training set that are similar to the testing images. We build the sample covariance matrix and the regression models only with these samples. We consider either similar frames or data from subjects with similar eye appearance. The promising results suggest that the proposed training approach is a feasible and convenient scheme for gaze-based multimodal interfaces.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132504077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session 2: Multimodal Fusion","authors":"Björn Schuller","doi":"10.1145/3246742","DOIUrl":"https://doi.org/10.1145/3246742","url":null,"abstract":"","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114368163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saskia Koldijk, Maya Sappelli, S. Verberne, Mark Antonius Neerincx, Wessel Kraaij
{"title":"The SWELL Knowledge Work Dataset for Stress and User Modeling Research","authors":"Saskia Koldijk, Maya Sappelli, S. Verberne, Mark Antonius Neerincx, Wessel Kraaij","doi":"10.1145/2663204.2663257","DOIUrl":"https://doi.org/10.1145/2663204.2663257","url":null,"abstract":"This paper describes the new multimodal SWELL knowledge work (SWELL-KW) dataset for research on stress and user modeling. The dataset was collected in an experiment, in which 25 people performed typical knowledge work (writing reports, making presentations, reading e-mail, searching for information). We manipulated their working conditions with the stressors: email interruptions and time pressure. A varied set of data was recorded: computer logging, facial expression from camera recordings, body postures from a Kinect 3D sensor and heart rate (variability) and skin conductance from body sensors. The dataset made available not only contains raw data, but also preprocessed data and extracted features. The participants' subjective experience on task load, mental effort, emotion and perceived stress was assessed with validated questionnaires as a ground truth. The resulting dataset on working behavior and affect is a valuable contribution to several research fields, such as work psychology, user modeling and context aware systems.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114462593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Suzuki, Shutaro Homma, Eri Matsuura, Ken-ichi Okada
{"title":"System for Presenting and Creating Smell Effects to Video","authors":"R. Suzuki, Shutaro Homma, Eri Matsuura, Ken-ichi Okada","doi":"10.1145/2663204.2663269","DOIUrl":"https://doi.org/10.1145/2663204.2663269","url":null,"abstract":"Olfaction has recently been gaining attention in information and communication technology, as shown by attempts in theaters to screen videos while emitting scents. However, because there is no current infrastructure to communicate and synchronize odor information with visual information, people cannot enjoy this experience at home. Therefore, we have constructed a system of smell videos which could be applied to television (TV), allowing viewers to experience scents while watching their videos. To solve the abovementioned technical problems, we propose using the existing system for broadcasting closed caption. Our system's implementation is mindful of both video viewers and producers, allowing the system on the viewer end to disperse odorants in synchronization with videos, and allowing producers to add odor information to videos. We finally verify the system's feasibility. We expect that this study will make smell videos become common, and people will enjoy ones in daily life in the near future.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128528747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Interaction for Future Control Centers: An Interactive Demonstrator","authors":"Ferdinand Fuhrmann, Rene Kaiser","doi":"10.1145/2663204.2669620","DOIUrl":"https://doi.org/10.1145/2663204.2669620","url":null,"abstract":"This interactive demo exhibits a visionary multimodal interaction concept designed to support operators in future control centers. The applied multi-layered hardware and software architecture directly supports the operators in performing their lengthy monitoring and urgent alarm handling tasks. Operators are presented with visual information on three completely configurable levels of screen displays. Gesture interaction via skeleton and finger tracking acts as the main control interaction principle. We particularly developed a special sensor-equipped chair as well as an audio interface allowing for speaking and listening in isolation without any wearable device.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128687323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Abouelenien, Verónica Pérez-Rosas, Rada Mihalcea, Mihai Burzo
{"title":"Deception detection using a multimodal approach","authors":"M. Abouelenien, Verónica Pérez-Rosas, Rada Mihalcea, Mihai Burzo","doi":"10.1145/2663204.2663229","DOIUrl":"https://doi.org/10.1145/2663204.2663229","url":null,"abstract":"In this paper we address the automatic identification of deceit by using a multimodal approach. We collect deceptive and truthful responses using a multimodal setting where we acquire data using a microphone, a thermal camera, as well as physiological sensors. Among all available modalities, we focus on three modalities namely, language use, physiological response, and thermal sensing. To our knowledge, this is the first work to integrate these specific modalities to detect deceit. Several experiments are carried out in which we first select representative features for each modality, and then we analyze joint models that integrate several modalities. The experimental results show that the combination of features from different modalities significantly improves the detection of deceptive behaviors as compared to the use of one modality at a time. Moreover, the use of non-contact modalities proved to be comparable with and sometimes better than existing contact-based methods. The proposed method increases the efficiency of detecting deceit by avoiding human involvement in an attempt to move towards a completely automated non-invasive deception detection process.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126000741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Multimodal Fusion: Combining Discrete Events and Continuous Signals","authors":"H. P. Martínez, Georgios N. Yannakakis","doi":"10.1145/2663204.2663236","DOIUrl":"https://doi.org/10.1145/2663204.2663236","url":null,"abstract":"Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when studying human behaviour it is common to annotate actions performed by the participant over several other modalities such as video recordings of the face or physiological signals. These events are nominal, not frequent and are not sampled at a continuous rate while signals are numeric and often sampled at short fixed intervals. This fundamentally different nature complicates the analysis of the relation among these modalities which is often studied after each modality has been summarised or reduced. This paper investigates a novel approach to model the relation between such modality types bypassing the need for summarising each modality independently of each other. For that purpose, we introduce a deep learning model based on convolutional neural networks that is adapted to process multiple modalities at different time resolutions we name deep multimodal fusion. Furthermore, we introduce and compare three alternative methods (convolution, training and pooling fusion) to integrate sequences of events with continuous signals within this model. We evaluate deep multimodal fusion using a game user dataset where player physiological signals are recorded in parallel with game events. Results suggest that the proposed architecture can appropriately capture multimodal information as it yields higher prediction accuracies compared to single-modality models. In addition, it appears that pooling fusion, based on a novel filter-pooling method provides the more effective fusion approach for the investigated types of data.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125031884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A World without Barriers: Connecting the World across Languages, Distances and Media","authors":"A. Waibel","doi":"10.1145/2663204.2669986","DOIUrl":"https://doi.org/10.1145/2663204.2669986","url":null,"abstract":"As our world becomes increasingly interdependent and globalization brings people together more than ever, we quickly discover that it is no longer the absence of connectivity (the \"digital divide\") that separates us, but that new and different forms of alienation still keep us apart, including language, culture, distance and interfaces. Can technology provide solutions to bring us closer to our fellow humans? In this talk, I will present multilingual and multimodal interface technology solutions that offer the best of both worlds: maintaining our cultural diversity and locale while providing for better communication, greater integration and collaboration. We explore: Smart phone based speech translators for everyday travelers and humanitarian missions Simultaneous translation systems and services to translate academic lectures and political speeches in real time (at Universities, the European Parliament and broadcasting services) Multimodal language-transparent interfaces and smartrooms to improve joint and distributed communication and interaction. We will first discuss the difficulties of language processing; review how the technology works today and what levels of performance are now possible. Key to today's systems is effective machine learning, without which scaling multilingual and multimodal systems to unlimited domains, modalities, accents, and more than 6,000 languages would be hopeless. Equally important are effective human-computer interfaces, so that language differences fade naturally into the background and communication and interaction become natural and engaging. I will present recent research results as well as examples from our field trials and deployments in educational, commercial, humanitarian and government settings.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129741869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial Expression Analysis for Estimating Pain in Clinical Settings","authors":"Karan Sikka","doi":"10.1145/2663204.2666282","DOIUrl":"https://doi.org/10.1145/2663204.2666282","url":null,"abstract":"Pain assessment is vital for effective pain management in clinical settings. It is generally obtained via patient's self-report or observer's assessment. Both of these approaches suffer from several drawbacks such as unavailability of self-report, idiosyncratic use and observer bias. This work aims at developing automated machine learning based approaches for estimating pain in clinical settings. We propose to use facial expression information to accomplish current goals since previous studies have demonstrated consistency between facial behavior and experienced pain. Moreover, with recent advances in computer vision it is possible to design algorithms for identifying spontaneous expressions such as pain in more naturalistic conditions. Our focus is towards designing robust computer vision models for estimating pain in videos containing patient's facial behavior. In this regard we discuss different research problem, technical approaches and challenges that needs to be addressed. In this work we particularly highlight the problem of predicting self-report measures of pain intensity since this problem is not only more challenging but also received less attention. We also discuss our efforts towards collecting an in-situ pediatric pain dataset for validating these approaches. We conclude the paper by presenting some results on both UNBC Mc-Master Pain dataset and pediatric pain dataset.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126555811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Appearance based user-independent gaze estimation","authors":"Nanxiang Li","doi":"10.1145/2663204.2666288","DOIUrl":"https://doi.org/10.1145/2663204.2666288","url":null,"abstract":"An ideal gaze user interface should be able to accurately estimates the user's gaze direction in a non-intrusive setting. Most studies on gaze estimation focus on the accuracy of the estimation results, imposing important constraints on the user such as no head movement, intrusive head mount setting and repetitive calibration process. Due to these limitations, most graphic user interfaces (GUIs) are reluctant to include gaze as an input modality. We envision user-independent gaze detectors for user computer interaction that do not impose any constraints on the users. We believe the appearance of the eye pairs, which implicitly reveals head pose, provides conclusive information on the gaze direction. More importantly, the relative appearance changes in the eye pairs due to the different gaze direction should be consistent among different human subjects. We collected a multimodal corpus (MSP-GAZE) to study and evaluate user independent, appearance based gaze estimation approaches. This corpus considers important factors that affect the appearance based gaze estimation: the individual difference, the head movement, and the distance between the user and the interface's screen. Using this database, our initial study focused on the eye pair appearance eigenspace approach, where the projections into the eye appearance eigenspace basis are used to build regression models to estimate the gaze position. We compare the results between user dependent (training and testing on the same subject) and user independent (testing subject is not included in the training data) models. As expected, due to the individual differences between subjects, the performance decreases when the models are trained without data from the target user. The study aims to reduce the gap between user dependent and user independent conditions.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121350365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}