K. Inoue, Kohei Hara, Divesh Lala, Kenta Yamamoto, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara
{"title":"Job Interviewer Android with Elaborate Follow-up Question Generation","authors":"K. Inoue, Kohei Hara, Divesh Lala, Kenta Yamamoto, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara","doi":"10.1145/3382507.3418839","DOIUrl":"https://doi.org/10.1145/3382507.3418839","url":null,"abstract":"A job interview is a domain that takes advantage of an android robot's human-like appearance and behaviors. In this work, our goal is to implement a system in which an android plays the role of an interviewer so that users may practice for a real job interview. Our proposed system generates elaborate follow-up questions based on responses from the interviewee. We conducted an interactive experiment to compare the proposed system against a baseline system that asked only fixed-form questions. We found that this system was significantly better than the baseline system with respect to the impression of the interview and the quality of the questions, and that the presence of the android interviewer was enhanced by the follow-up questions. We also found a similar result when using a virtual agent interviewer, except that presence was not enhanced.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124107507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Hung, Gabriel Murray, G. Varni, N. Lehmann-Willenbrock, Fabiola H. Gerpott, Catharine Oertel
{"title":"Workshop on Interdisciplinary Insights into Group and Team Dynamics","authors":"H. Hung, Gabriel Murray, G. Varni, N. Lehmann-Willenbrock, Fabiola H. Gerpott, Catharine Oertel","doi":"10.1145/3382507.3419748","DOIUrl":"https://doi.org/10.1145/3382507.3419748","url":null,"abstract":"There has been gathering momentum over the last 10 years in the study of group behavior in multimodal multiparty interactions. While many works in the computer science community focus on the analysis of individual or dyadic interactions, we believe that the study of groups adds an additional layer of complexity with respect to how humans cooperate and what outcomes can be achieved in these settings. Moreover, the development of technologies that can help to interpret and enhance group behaviours dynamically is still an emerging field. Social theories that accompany the study of groups dynamics are in their infancy and there is a need for more interdisciplinary dialogue between computer scientists and social scientists on this topic. This workshop has been organised to facilitate those discussions and strengthen the bonds between these overlapping research communities","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115468705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eleonora Ceccaldi, B. Bardy, N. Bianchi-Berthouze, L. Fadiga, G. Volpe, A. Camurri
{"title":"The First International Workshop on Multi-Scale Movement Technologies","authors":"Eleonora Ceccaldi, B. Bardy, N. Bianchi-Berthouze, L. Fadiga, G. Volpe, A. Camurri","doi":"10.1145/3382507.3420060","DOIUrl":"https://doi.org/10.1145/3382507.3420060","url":null,"abstract":"Multimodal interfaces pose the challenge of dealing with the multi-ple interactive time-scales characterizing human behavior. To dothis, innovative models and time-adaptive technologies are needed,operating at multiple time-scales and adopting a multi-layered ap-proach. The first International Workshop on Multi-Scale MovementTechnologies, hosted virtually during the 22nd ACM InternationalConference on Multimodal Interaction, is aimed at providing re-searchers from different areas with the opportunity to discuss thistopic. This paper summarizes the activities of the workshop andthe accepted papers","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131009184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiroki Tanaka, Satoshi Nakamura, Jean-Claude Martin, C. Pelachaud
{"title":"Social Affective Multimodal Interaction for Health","authors":"Hiroki Tanaka, Satoshi Nakamura, Jean-Claude Martin, C. Pelachaud","doi":"10.1145/3382507.3420059","DOIUrl":"https://doi.org/10.1145/3382507.3420059","url":null,"abstract":"This workshop discusses how interactive, multimodal technology such as virtual agents can be used in social skills training for measuring and training social-affective interactions. Sensing technology now enables analyzing user's behaviors and physiological signals. Various signal processing and machine learning methods can be used for such prediction tasks. Such social signal processing and tools can be applied to measure and reduce social stress in everyday situations, including public speaking at schools and workplaces.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128121459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating the Intensity of Facial Expressions Accompanying Feedback Responses in Multiparty Video-Mediated Communication","authors":"Ryosuke Ueno, Y. Nakano, Jie Zeng, Fumio Nihei","doi":"10.1145/3382507.3418878","DOIUrl":"https://doi.org/10.1145/3382507.3418878","url":null,"abstract":"Providing feedback to a speaker is an essential communication signal for maintaining a conversation. In specific feedback, which indicates the listener's reaction to the speaker?s utterances, the facial expression is an effective modality for conveying the listener's reactions. Moreover, not only the type of facial expressions, but also the degree of intensity of the expressions, may influence the meaning of the specific feedback. In this study, we propose a multimodal deep neural network model that predicts the intensity of facial expressions co-occurring with feedback responses. We focus on multiparty video-mediated communication. In video-mediated communication, close-up frontal face images of each participant are continuously presented on the display; the attention of the participants is more likely to be drawn to the facial expressions. We assume that in such communication, the importance of facial expression in the listeners? feedback responses increases. We collected 33 video-mediated conversations by groups of three people and obtained audio and speech data for each participant. Using the corpus collected as a dataset, we created a deep neural network model that predicts the intensity of 17 types of action units (AUs) co-occurring with the feedback responses. The proposed method employed GRU-based model with attention mechanism for audio, visual, and language modalities. A decoder was trained to produce the intensity values for the 17 AUs frame by frame. In the experiment, unimodal and multimodal models were compared in terms of their performance in predicting salient AUs that characterize facial expression in feedback responses. The results suggest that well-performing models differ depending on the AU categories; audio information was useful for predicting AUs that express happiness, and visual and language information contributes to predicting AUs expressing sadness and disgust.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116892857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Ashish Jaiswal, Alexis Lueckenhoff, Maria Kyrarini, F. Makedon
{"title":"A Multi-modal System to Assess Cognition in Children from their Physical Movements","authors":"Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Ashish Jaiswal, Alexis Lueckenhoff, Maria Kyrarini, F. Makedon","doi":"10.1145/3382507.3418829","DOIUrl":"https://doi.org/10.1145/3382507.3418829","url":null,"abstract":"In recent years, computer and game-based cognitive tests have become popular with the advancement in mobile technology. However, these tests require very little body movements and do not consider the influence that physical motion has on cognitive development. Our work mainly focus on assessing cognition in children through their physical movements. Hence, an assessment test \"Ball-Drop-to-the-Beat\" that is both physically and cognitively demanding has been used where the child is expected to perform certain actions based on the commands. The task is specifically designed to measure attention, response inhibition, and coordination in children. A dataset has been created with 25 children performing this test. To automate the scoring, a computer vision-based assessment system has been developed. The vision system employs an attention-based fusion mechanism to combine multiple modalities such as optical flow, human poses, and objects in the scene to predict a child's action. The proposed method outperforms other state-of-the-art approaches by achieving an average accuracy of 89.8 percent on predicting the actions and an average accuracy of 88.5 percent on predicting the rhythm on the Ball-Drop-to-the-Beat dataset.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123951935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shree Krishna Subburaj, Angela E. B. Stewart, A. Rao, S. D’Mello
{"title":"Multimodal, Multiparty Modeling of Collaborative Problem Solving Performance","authors":"Shree Krishna Subburaj, Angela E. B. Stewart, A. Rao, S. D’Mello","doi":"10.1145/3382507.3418877","DOIUrl":"https://doi.org/10.1145/3382507.3418877","url":null,"abstract":"Modeling team phenomena from multiparty interactions inherently requires combining signals from multiple teammates, often by weighting strategies. Here, we explored the hypothesis that strategic weighting signals from individual teammates would outperform an equal weighting baseline. Accordingly, we explored role-, trait-, and behavior-based weighting of behavioral signals across team members. We analyzed data from 101 triads engaged in computer-mediated collaborative problem solving (CPS) in an educational physics game. We investigated the accuracy of machine-learned models trained on facial expressions, acoustic-prosodics, eye gaze, and task context information, computed one-minute prior to the end of a game level, at predicting success at solving that level. AUROCs for unimodal models that equally weighted features from the three teammates ranged from .54 to .67, whereas a combination of gaze, face, and task context features, achieved an AUROC of .73. The various multiparty weighting strategies did not outperform an equal-weighting baseline. However, our best nonverbal model (AUROC = .73) outperformed a language-based model (AUROC = .67), and there were some advantages to combining the two (AUROC = .75). Finally, models aimed at prospectively predicting performance on a minute-by-minute basis from the start of the level achieved a lower, but still above-chance, AUROC of .60. We discuss implications for multiparty modeling of team performance and other team constructs.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124422340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to Complement Learning Analytics with Smartwatches?: Fusing Physical Activities, Environmental Context, and Learning Activities","authors":"George-Petru Ciordas-Hertel","doi":"10.1145/3382507.3421151","DOIUrl":"https://doi.org/10.1145/3382507.3421151","url":null,"abstract":"To obtain a holistic perspective on learning, a multimodal technical infrastructure for Learning Analytics (LA) can be beneficial. Recent studies have investigated various aspects of technical LA infrastructure. However, it has not yet been explored how LA indicators can be complemented with Smartwatch sensor data to detect physical activity and the environmental context. Sensor data, such as the accelerometer, are often used in related work to infer a specific behavior and environmental context, thus triggering interventions on a just-in-time basis. In this dissertation project, we plan to use Smartwatch sensor data to explore further indicators for learning from blended learning sessions conducted in-the-wild, e.g., at home. Such indicators could be used within learning sessions to suggest breaks, or afterward to support learners in reflection processes. We plan to investigate the following three research questions: (RQ1) How can multimodal learning analytics infrastructure be designed to support real-time data acquisition and processing effectively?; (RQ2) how to use smartwatch sensor data to infer environmental context and physical activities to complement learning analytics indicators for blended learning sessions; and (RQ3) how can we align the extracted multimodal indicators with pedagogical interventions. RQ1 was investigated by a structured literature review and by conducting eleven semi-structured interviews with LA infrastructure developers. According to RQ2, we are currently designing and implementing a multimodal learning analytics infrastructure to collect and process sensor and experience data from Smartwatches. Finally, according to RQ3, an exploratory field study will be conducted to extract multimodal learning indicators and examine them with learners and pedagogical experts to develop effective interventions. Researchers, educators, and learners can use and adapt our contributions to gain new insights into learners' time and learning tactics, and physical learning spaces from learning sessions taking place in-the-wild.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121458245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brennan Jones, Jens Maiero, Alireza Mogharrab, I. A. Aguilar, Ashu Adhikari, B. Riecke, E. Kruijff, Carman Neustaedter, R. Lindeman
{"title":"FeetBack: Augmenting Robotic Telepresence with Haptic Feedback on the Feet","authors":"Brennan Jones, Jens Maiero, Alireza Mogharrab, I. A. Aguilar, Ashu Adhikari, B. Riecke, E. Kruijff, Carman Neustaedter, R. Lindeman","doi":"10.1145/3382507.3418820","DOIUrl":"https://doi.org/10.1145/3382507.3418820","url":null,"abstract":"Telepresence robots allow people to participate in remote spaces, yet they can be difficult to manoeuvre with people and obstacles around. We designed a haptic-feedback system called \"FeetBack,\" which users place their feet in when driving a telepresence robot. When the robot approaches people or obstacles, haptic proximity and collision feedback are provided on the respective sides of the feet, helping inform users about events that are hard to notice through the robot's camera views. We conducted two studies: one to explore the usage of FeetBack in virtual environments, another focused on real environments. We found that FeetBack can increase spatial presence in simple virtual environments. Users valued the feedback to adjust their behaviour in both types of environments, though it was sometimes too frequent or unneeded for certain situations after a period of time. These results point to the value of foot-based haptic feedback for telepresence robot systems, while also the need to design context-sensitive haptic feedback.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121537174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nathan L. Henderson, Wookhee Min, Jonathan P. Rowe, James C. Lester
{"title":"Enhancing Affect Detection in Game-Based Learning Environments with Multimodal Conditional Generative Modeling","authors":"Nathan L. Henderson, Wookhee Min, Jonathan P. Rowe, James C. Lester","doi":"10.1145/3382507.3418892","DOIUrl":"https://doi.org/10.1145/3382507.3418892","url":null,"abstract":"Accurately detecting and responding to student affect is a critical capability for adaptive learning environments. Recent years have seen growing interest in modeling student affect with multimodal sensor data. A key challenge in multimodal affect detection is dealing with data loss due to noisy, missing, or invalid multimodal features. Because multimodal affect detection often requires large quantities of data, data loss can have a strong, adverse impact on affect detector performance. To address this issue, we present a multimodal data imputation framework that utilizes conditional generative models to automatically impute posture and interaction log data from student interactions with a game-based learning environment for emergency medical training. We investigate two generative models, a Conditional Generative Adversarial Network (C-GAN) and a Conditional Variational Autoencoder (C-VAE), that are trained using a modality that has undergone varying levels of artificial data masking. The generative models are conditioned on the corresponding intact modality, enabling the data imputation process to capture the interaction between the concurrent modalities. We examine the effectiveness of the conditional generative models on imputation accuracy and its impact on the performance of affect detection. Each imputation model is evaluated using varying amounts of artificial data masking to determine how the data missingness impacts the performance of each imputation method. Results based on the modalities captured from students? interactions with the game-based learning environment indicate that deep conditional generative models within a multimodal data imputation framework yield significant benefits compared to baseline imputation techniques in terms of both imputation accuracy and affective detector performance.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116793511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}