Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献_第4页

If You Ask Nicely: A Digital Assistant Rebuking Impolite Voice Commands 如果你礼貌地要求:一个数字助理斥责不礼貌的语音命令

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242995

Michael Bonfert, Maximilian Spliethöver, Roman Arzaroli, Marvin Lange, Martin Hanci, R. Porzel

引用次数: 22

Survival at the Museum: A Cooperation Experiment with Emotionally Expressive Virtual Characters 生存在博物馆:与情感表达虚拟人物的合作实验

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242984

Ilaria Torre, E. Carrigan, Killian McCabe, R. Mcdonnell, N. Harte

{"title":"Survival at the Museum: A Cooperation Experiment with Emotionally Expressive Virtual Characters","authors":"Ilaria Torre, E. Carrigan, Killian McCabe, R. Mcdonnell, N. Harte","doi":"10.1145/3242969.3242984","DOIUrl":"https://doi.org/10.1145/3242969.3242984","url":null,"abstract":"Correctly interpreting an interlocutor's emotional expression is paramount to a successful interaction. But what happens when one of the interlocutors is a machine? The facilitation of human-machine communication and cooperation is of growing importance as smartphones, autonomous cars, or social robots increasingly pervade human social spaces. Previous research has shown that emotionally expressive virtual characters generally elicit higher cooperation and trust than 'neutral' ones. Since emotional expressions are multi-modal, and given that virtual characters can be designed to our liking in all their components, would a mismatch in the emotion expressed in the face and voice influence people's cooperation with a virtual character? We developed a game where people had to cooperate with a virtual character in order to survive on the moon. The character's face and voice were designed to either smile or not, resulting in 4 conditions: smiling voice and face, neutral voice and face, smiling voice only (neutral face), smiling face only (neutral voice). The experiment was set up in a museum over the course of several weeks; we report preliminary results from over 500 visitors, showing that people tend to trust the virtual character in the mismatched condition with the smiling face and neutral voice more. This might be because the two channels express different aspects of an emotion, as previously suggested.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121109210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

SAAMEAT

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243685

F. Haider, Senja Pollak, E. Zarogianni, S. Luz

{"title":"SAAMEAT","authors":"F. Haider, Senja Pollak, E. Zarogianni, S. Luz","doi":"10.1145/3242969.3243685","DOIUrl":"https://doi.org/10.1145/3242969.3243685","url":null,"abstract":"Automatic recognition of eating conditions of humans could be a useful technology in health monitoring. The audio-visual information can be used in automating this process, and feature engineering approaches can reduce the dimensionality of audio-visual information. The reduced dimensionality of data (particularly feature subset selection) can assist in designing a system for eating conditions recognition with lower power, cost, memory and computation resources than a system which is designed using full dimensions of data. This paper presents Active Feature Transformation (AFT) and Active Feature Selection (AFS) methods, and applies them to all three tasks of the ICMI 2018 EAT Challenge for recognition of user eating conditions using audio and visual features. The AFT method is used for the transformation of the Mel-frequency Cepstral Coefficient and ComParE features for the classification task, while the AFS method helps in selecting a feature subset. Transformation by Principal Component Analysis (PCA) is also used for comparison. We find feature subsets of audio features using the AFS method (422 for Food Type, 104 for Likability and 68 for Difficulty out of 988 features) which provide better results than the full feature set. Our results show that AFS outperforms PCA and AFT in terms of accuracy for the recognition of user eating conditions using audio features. The AFT of visual features (facial landmarks) provides less accurate results than the AFS and AFT sets of audio features. However, the weighted score fusion of all the feature set improves the results.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129711323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Multimodal Teaching and Learning Analytics for Classroom and Online Educational Settings 课堂和在线教育设置的多模式教学和学习分析

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264969

Chinchu Thomas

引用次数: 11

3rd International Workshop on Multisensory Approaches to Human-Food Interaction 第三届人与食物互动的多感官方法国际研讨会

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3265860

A. Nijholt, Carlos Velasco, Marianna Obrist, K. Okajima, C. Spence

引用次数: 6

Large Vocabulary Continuous Audio-Visual Speech Recognition 大词汇量连续视听语音识别

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264976

George Sterpu

引用次数: 0

!FTL, an Articulation-Invariant Stroke Gesture Recognizer with Controllable Position, Scale, and Rotation Invariances FTL，一种具有可控位置、比例和旋转不变性的衔接不变笔画手势识别器

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243032

J. Vanderdonckt, Paolo Roselli, J. Pérez-Medina

引用次数: 41

Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network 基于四流混合网络的深度模型的群体级情感识别

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264987

Ahmed-Shehab Khan, Zhiyuan Li, Jie Cai, Zibo Meng, James O'Reilly, Yan Tong

{"title":"Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network","authors":"Ahmed-Shehab Khan, Zhiyuan Li, Jie Cai, Zibo Meng, James O'Reilly, Yan Tong","doi":"10.1145/3242969.3264987","DOIUrl":"https://doi.org/10.1145/3242969.3264987","url":null,"abstract":"Group-level Emotion Recognition (GER) in the wild is a challenging task gaining lots of attention. Most recent works utilized two channels of information, a channel involving only faces and a channel containing the whole image, to solve this problem. However, modeling the relationship between faces and scene in a global image remains challenging. In this paper, we proposed a novel face-location aware global network, capturing the face location information in the form of an attention heatmap to better model such relationships. We also proposed a multi-scale face network to infer the group-level emotion from individual faces, which explicitly handles high variance in image and face size, as images in the wild are collected from different sources with different resolutions. In addition, a global blurred stream was developed to explicitly learn and extract the scene-only features. Finally, we proposed a four-stream hybrid network, consisting of the face-location aware global stream, the multi-scale face stream, a global blurred stream, and a global stream, to address the GER task, and showed the effectiveness of our method in GER sub-challenge, a part of the six Emotion Recognition in the Wild (EmotiW 2018) [10] Challenge. The proposed method achieved 65.59% and 78.39% accuracy on the testing and validation sets, respectively, and is ranked the third place on the leaderboard.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124830734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

A Multimodal Approach for Predicting Changes in PTSD Symptom Severity 预测PTSD症状严重程度变化的多模式方法

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242981

Adria Mallol-Ragolta, Svati Dhamija, T. Boult

{"title":"A Multimodal Approach for Predicting Changes in PTSD Symptom Severity","authors":"Adria Mallol-Ragolta, Svati Dhamija, T. Boult","doi":"10.1145/3242969.3242981","DOIUrl":"https://doi.org/10.1145/3242969.3242981","url":null,"abstract":"The rising prevalence of mental illnesses is increasing the demand for new digital tools to support mental wellbeing. Numerous collaborations spanning the fields of psychology, machine learning and health are building such tools. Machine-learning models that estimate effects of mental health interventions currently rely on either user self-reports or measurements of user physiology. In this paper, we present a multimodal approach that combines self-reports from questionnaires and skin conductance physiology in a web-based trauma-recovery regime. We evaluate our models on the EASE multimodal dataset and create PTSD symptom severity change estimators at both total and cluster-level. We demonstrate that modeling the PTSD symptom severity change at the total-level with self-reports can be statistically significantly improved by the combination of physiology and self-reports or just skin conductance measurements. Our experiments show that PTSD symptom cluster severity changes using our novel multimodal approach are significantly better modeled than using self-reports and skin conductance alone when extracting skin conductance features from triggers modules for avoidance, negative alterations in cognition & mood and alterations in arousal & reactivity symptoms, while it performs statistically similar for intrusion symptom.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124055981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Understanding Mobile Reading via Camera Based Gaze Tracking and Kinematic Touch Modeling 通过基于相机的凝视跟踪和运动触摸建模来理解移动阅读

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243011

Wei Guo, Jingtao Wang

{"title":"Understanding Mobile Reading via Camera Based Gaze Tracking and Kinematic Touch Modeling","authors":"Wei Guo, Jingtao Wang","doi":"10.1145/3242969.3243011","DOIUrl":"https://doi.org/10.1145/3242969.3243011","url":null,"abstract":"Despite the ubiquity and rapid growth of mobile reading activities, researchers and practitioners today either rely on coarse-grained metrics such as click-through-rate (CTR) and dwell time, or expensive equipment such as gaze trackers to understand users' reading behavior on mobile devices. We present Lepton, an intelligent mobile reading system and a set of dual-channel sensing algorithms to achieve scalable and fine-grained understanding of users' reading behaviors, comprehension, and engagements on unmodified smartphones. Lepton tracks the periodic lateral patterns, i.e. saccade, of users' eye gaze via the front camera, and infers their muscle stiffness during text scrolling via a Mass-Spring-Damper (MSD) based kinematic model from touch events. Through a 25-participant study, we found that both the periodic saccade patterns and muscle stiffness signals captured by Lepton can be used as expressive features to infer users' comprehension and engagement in mobile reading. Overall, our new signals lead to significantly higher performances in predicting users' comprehension (correlation: 0.36 vs. 0.29), concentration (0.36 vs. 0.16), confidence (0.5 vs. 0.47), and engagement (0.34 vs. 0.16) than using traditional dwell-time based features via a user-independent model.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117325659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6