{"title":"Adaptive Review for Mobile MOOC Learning via Multimodal Physiological Signal Sensing - A Longitudinal Study","authors":"Phuong Pham, Jingtao Wang","doi":"10.1145/3242969.3243002","DOIUrl":"https://doi.org/10.1145/3242969.3243002","url":null,"abstract":"Despite the great potential, Massive Open Online Courses (MOOCs) face major challenges such as low retention rate, limited feedback, and lack of personalization. In this paper, we report the results of a longitudinal study on AttentiveReview2, a multimodal intelligent tutoring system optimized for MOOC learning on unmodified mobile devices. AttentiveReview2 continuously monitors learners' physiological signals, facial expressions, and touch interactions during learning and recommends personalized review materials by predicting each learner's perceived difficulty on each learning topic. In a 3-week study involving 28 learners, we found that AttentiveReview2 on average improved learning gains by 21.8% in weekly tests. Follow-up analysis shows that multi-modal signals collected from the learning process can also benefit instructors by providing rich and fine-grained insights on the learning progress. Taking advantage of such signals also improves prediction accuracies in emotion and test scores when compared with clickstream analysis.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131403324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gazeover -- Exploring the UX of Gaze-triggered Affordance Communication for GUI Elements","authors":"Ilhan Aslan, Michael Dietz, E. André","doi":"10.1145/3242969.3242987","DOIUrl":"https://doi.org/10.1145/3242969.3242987","url":null,"abstract":"The user experience (UX) of graphical user interfaces (GUIs) often depends on how clearly visual designs communicate/signify \"affordances\", such as if an element on the screen can be pushed, dragged, or rotated. Especially for novice users figuring out the complexity of a new interface can be cumbersome. In the \"past\" era of mouse-based interaction mouseover effects were successfully utilized to trigger a variety of assistance, and help users in exploring interface elements without causing unintended interactions and associated negative experiences. Today's GUIs are increasingly designed for touch and lack a method similiar to mouseover to help (novice) users to get acquainted with interface elements. In order to address this issue, we have studied gazeover, as a technique for triggering \"help or guidance\" when a user's gaze is over an interactive element, which we believe is suitable for today's touch interfaces. We report on a user study comparing pragmatic and hedonic qualities of gazeover and mouseover, which showed significant higher ratings in hedonic quality for the gazeover technique. We conclude by discussing limitations and implications of our findings.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131805344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Schmidt, Attila Reiss, R. Dürichen, C. Marberger, Kristof Van Laerhoven
{"title":"Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection","authors":"P. Schmidt, Attila Reiss, R. Dürichen, C. Marberger, Kristof Van Laerhoven","doi":"10.1145/3242969.3242985","DOIUrl":"https://doi.org/10.1145/3242969.3242985","url":null,"abstract":"Affect recognition aims to detect a person's affective state based on observables, with the goal to e.g. improve human-computer interaction. Long-term stress is known to have severe implications on wellbeing, which call for continuous and automated stress monitoring systems. However, the affective computing community lacks commonly used standard datasets for wearable stress detection which a) provide multimodal high-quality data, and b) include multiple affective states. Therefore, we introduce WESAD, a new publicly available dataset for wearable stress and affect detection. This multimodal dataset features physiological and motion data, recorded from both a wrist- and a chest-worn device, of 15 subjects during a lab study. The following sensor modalities are included: blood volume pulse, electrocardiogram, electrodermal activity, electromyogram, respiration, body temperature, and three-axis acceleration. Moreover, the dataset bridges the gap between previous lab studies on stress and emotions, by containing three different affective states (neutral, stress, amusement). In addition, self-reports of the subjects, which were obtained using several established questionnaires, are contained in the dataset. Furthermore, a benchmark is created on the dataset, using well-known features and standard machine learning methods. Considering the three-class classification problem ( baseline vs. stress vs. amusement ), we achieved classification accuracies of up to 80%,. In the binary case ( stress vs. non-stress ), accuracies of up to 93%, were reached. Finally, we provide a detailed analysis and comparison of the two device locations ( chest vs. wrist ) as well as the different sensor modalities.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122386607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Modal Multi sensor Interaction between Human andHeterogeneous Multi-Robot System","authors":"S. Mahi","doi":"10.1145/3242969.3264971","DOIUrl":"https://doi.org/10.1145/3242969.3264971","url":null,"abstract":"I introduce a novel multi-modal multi-sensor interaction method between humans and heterogeneous multi-robot systems. I have also developed a novel algorithm to control heterogeneous multi-robot systems. The proposed algorithm allows the human operator to provide intentional cues and information to a multi-robot system using a multimodal multi-sensor touchscreen interface. My proposed method can effectively convey complex human intention to multiple robots as well as represent robots' intentions over the spatiotemporal domain. The proposed method is scalable and robust to dynamic change in the deployment configuration. I describe the implementation of the control algorithm used to control multiple quad-rotor unmanned aerial vehicles in simulated and real environments. I will also present my initial work on human interaction with the robots running my algorithm using mobile phone touch screens and other potential multimodal interactions.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124999625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting User's Likes and Dislikes for a Virtual Negotiating Agent","authors":"Caroline Langlet, C. Clavel","doi":"10.1145/3242969.3243024","DOIUrl":"https://doi.org/10.1145/3242969.3243024","url":null,"abstract":"This article tackles the issue of the detection of the user's likes and dislikes in a negotiation with a virtual agent for helping the creation of a model of user's preferences. We introduce a linguistic model of user's likes and dislikes as they are expressed in a negotiation context. The identification of syntactic and semantic features enables the design of formal grammars embedded in a bottom-up and rule-based system. It deals with conversational context by considering simple and collaborative likes and dislikes within adjacency pairs. We present the annotation campaign we conduct by recruiting annotators on CrowdFlower and using a dedicated annotation platform. Finally, we measure agreement between our system and the human reference. The obtained scores show substantial agreement.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125077006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Almoctar Hassoumi, Pourang Irani, Vsevolod Peysakhovich, C. Hurter
{"title":"Path Word: A Multimodal Password Entry Method for Ad-hoc Authentication Based on Digits' Shape and Smooth Pursuit Eye Movements","authors":"Almoctar Hassoumi, Pourang Irani, Vsevolod Peysakhovich, C. Hurter","doi":"10.1145/3242969.3243008","DOIUrl":"https://doi.org/10.1145/3242969.3243008","url":null,"abstract":"We present PathWord (PATH passWORD), a multimodal digit entry method for ad-hoc authentication based on known digits shape and user relative eye movements. PathWord is a touch-free, gaze-based input modality, which attempts to decrease shoulder surfing attacks when unlocking a system using PINs. The system uses a modified web camera to detect the user's eye. This enables suppressing direct touch, making it difficult for passer-bys to be aware of the input digits, thus reducing shoulder surfing and smudge attacks. In addition to showing high accuracy rates (Study 1: 87.1% successful entries) and strong confidentiality through detailed evaluations with 42 participants (Study 2), we demonstrate how PathWord considerably diminishes the potential of stolen passwords (on average 2.38% stolen passwords with PathWord vs. over 90% with traditional PIN screen). We show use-cases of PathWord and discuss its advantages over traditional input modalities. We envision PathWord as a method to foster confidence while unlocking a system through gaze gestures.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129160128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Control of Lighter-Than-Air Agents","authors":"D. Lofaro, D. Sofge","doi":"10.1145/3242969.3266296","DOIUrl":"https://doi.org/10.1145/3242969.3266296","url":null,"abstract":"This work describes our approach to controlling lighter-than-air agents using multimodal control via a wearable device. Tactile and gesture interfaces on a smart watch are used to control the motion and altitude of these semi-autonomous agents. The tactile interface consists of the touch screen and rotatable bezel. The gesture interface detects when the user puts his/her hand in the stop position. The touch interface controls the direction of the agents, the rotatable bezel controls the altitude set-point, and the gesture interface stops the agents. Our interactive demonstration will allow users to control a lighter-than-air (LTA) system via the multimodal wearable interface as described above.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128205689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Attentive Speed Reading on Small Screen Wearable Devices","authors":"Wei Guo, Jingtao Wang","doi":"10.1145/3242969.3243009","DOIUrl":"https://doi.org/10.1145/3242969.3243009","url":null,"abstract":"Smart watches can enrich everyday interactions by providing both glanceable information and instant access to frequent tasks. However, reading text messages on a 1.5-inch small screen is inherently challenging, especially when a user's attention is divided. We present SmartRSVP, an attentive speed-reading system to facilitate text reading on small-screen wearable devices. SmartRSVP leverages camera-based visual attention tracking and implicit physiological signal sensing to make text reading via Rapid Serial Visual Presentation (RSVP) more enjoyable and practical on smart watches. Through a series of three studies involving 40 participants, we found that 1) SmartRSVP can achieve a significantly higher comprehension rate (57.5% vs. 23.9%) and perceived comfort (3.8 vs. 2.1) than traditional RSVP; 2) Users prefer SmartRSVP over traditional reading interfaces when they walk and read; 3) SmartRSVP can predict users' cognitive workloads and adjust the reading speed accordingly in real-time with 83.3% precision.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114572312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ichiro Umata, Koki Ijuin, T. Kato, Seiichi Yamamoto
{"title":"Floor Apportionment and Mutual Gazes in Native and Second-Language Conversation","authors":"Ichiro Umata, Koki Ijuin, T. Kato, Seiichi Yamamoto","doi":"10.1145/3242969.3242991","DOIUrl":"https://doi.org/10.1145/3242969.3242991","url":null,"abstract":"Quantitative analysis of gazes between a speaker and listeners was conducted from the viewpoint of mutual activities in floor apportionment, with the assumption that mutual gaze plays an important role in coordinating speech interaction. We conducted correlation analyses of the speaker's and listener's gazes in a three-party conversation, comparing native language (L1) and second language (L2) interaction in two types (free-flowing and goal-orient- ed). The analyses showed significant correlations between gazes from the current to the next speaker and those from the next to the current speaker during utterances preceding a speaker change in L1 conversation, suggesting that the participants were coordinating their speech turns with mutual gazes. In L2 conversation, however, such a correlation was found only in the goal-oriented type, suggesting that linguistic proficiency may affect the floor-apportionment function of mutual gazes, possibly because of the cognitive load of understanding/producing utterances.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130407461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video-based Emotion Recognition Using Deeply-Supervised Neural Networks","authors":"Yingruo Fan, J. Lam, V. Li","doi":"10.1145/3242969.3264978","DOIUrl":"https://doi.org/10.1145/3242969.3264978","url":null,"abstract":"Emotion recognition (ER) based on natural facial images/videos has been studied for some years and considered a comparatively hot topic in the field of affective computing. However, it remains a challenge to perform ER in the wild, given the noises generated from head pose, face deformation, and illumination variation. To address this challenge, motivated by recent progress in Convolutional Neural Network (CNN), we develop a novel deeply supervised CNN (DSN) architecture, taking the multi-level and multi-scale features extracted from different convolutional layers to provide a more advanced representation of ER. By embedding a series of side-output layers, our DSN model provides class-wise supervision and integrates predictions from multiple layers. Finally, our team ranked 3rd at the EmotiW 2018 challenge with our model achieving an accuracy of 61.1%.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"51 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131653617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}