Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献

筛选
英文 中文
If You Ask Nicely: A Digital Assistant Rebuking Impolite Voice Commands 如果你礼貌地要求:一个数字助理斥责不礼貌的语音命令
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242995
Michael Bonfert, Maximilian Spliethöver, Roman Arzaroli, Marvin Lange, Martin Hanci, R. Porzel
{"title":"If You Ask Nicely: A Digital Assistant Rebuking Impolite Voice Commands","authors":"Michael Bonfert, Maximilian Spliethöver, Roman Arzaroli, Marvin Lange, Martin Hanci, R. Porzel","doi":"10.1145/3242969.3242995","DOIUrl":"https://doi.org/10.1145/3242969.3242995","url":null,"abstract":"Digital home assistants have an increasing influence on our everyday lives. The media now reports how children adapt the consequential, imperious language style when talking to real people. As a response to this behavior, we considered a digital assistant rebuking impolite language. We then investigated how adult users react when being rebuked by the AI. In a between-group study (N = 20), the participants were being rejected by our fictional speech assistant \"Eliza\" when they made impolite requests. As a result, we observed more polite behavior. Most test subjects accepted the AI's demand and said \"please\" significantly more often. However, many participants retrospectively denied Eliza the entitlement to politeness and criticized her attitude or refusal of service.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"76 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120840797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Survival at the Museum: A Cooperation Experiment with Emotionally Expressive Virtual Characters 生存在博物馆:与情感表达虚拟人物的合作实验
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242984
Ilaria Torre, E. Carrigan, Killian McCabe, R. Mcdonnell, N. Harte
{"title":"Survival at the Museum: A Cooperation Experiment with Emotionally Expressive Virtual Characters","authors":"Ilaria Torre, E. Carrigan, Killian McCabe, R. Mcdonnell, N. Harte","doi":"10.1145/3242969.3242984","DOIUrl":"https://doi.org/10.1145/3242969.3242984","url":null,"abstract":"Correctly interpreting an interlocutor's emotional expression is paramount to a successful interaction. But what happens when one of the interlocutors is a machine? The facilitation of human-machine communication and cooperation is of growing importance as smartphones, autonomous cars, or social robots increasingly pervade human social spaces. Previous research has shown that emotionally expressive virtual characters generally elicit higher cooperation and trust than 'neutral' ones. Since emotional expressions are multi-modal, and given that virtual characters can be designed to our liking in all their components, would a mismatch in the emotion expressed in the face and voice influence people's cooperation with a virtual character? We developed a game where people had to cooperate with a virtual character in order to survive on the moon. The character's face and voice were designed to either smile or not, resulting in 4 conditions: smiling voice and face, neutral voice and face, smiling voice only (neutral face), smiling face only (neutral voice). The experiment was set up in a museum over the course of several weeks; we report preliminary results from over 500 visitors, showing that people tend to trust the virtual character in the mismatched condition with the smiling face and neutral voice more. This might be because the two channels express different aspects of an emotion, as previously suggested.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121109210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
SAAMEAT
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243685
F. Haider, Senja Pollak, E. Zarogianni, S. Luz
{"title":"SAAMEAT","authors":"F. Haider, Senja Pollak, E. Zarogianni, S. Luz","doi":"10.1145/3242969.3243685","DOIUrl":"https://doi.org/10.1145/3242969.3243685","url":null,"abstract":"Automatic recognition of eating conditions of humans could be a useful technology in health monitoring. The audio-visual information can be used in automating this process, and feature engineering approaches can reduce the dimensionality of audio-visual information. The reduced dimensionality of data (particularly feature subset selection) can assist in designing a system for eating conditions recognition with lower power, cost, memory and computation resources than a system which is designed using full dimensions of data. This paper presents Active Feature Transformation (AFT) and Active Feature Selection (AFS) methods, and applies them to all three tasks of the ICMI 2018 EAT Challenge for recognition of user eating conditions using audio and visual features. The AFT method is used for the transformation of the Mel-frequency Cepstral Coefficient and ComParE features for the classification task, while the AFS method helps in selecting a feature subset. Transformation by Principal Component Analysis (PCA) is also used for comparison. We find feature subsets of audio features using the AFS method (422 for Food Type, 104 for Likability and 68 for Difficulty out of 988 features) which provide better results than the full feature set. Our results show that AFS outperforms PCA and AFT in terms of accuracy for the recognition of user eating conditions using audio features. The AFT of visual features (facial landmarks) provides less accurate results than the AFS and AFT sets of audio features. However, the weighted score fusion of all the feature set improves the results.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129711323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multimodal Teaching and Learning Analytics for Classroom and Online Educational Settings 课堂和在线教育设置的多模式教学和学习分析
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264969
Chinchu Thomas
{"title":"Multimodal Teaching and Learning Analytics for Classroom and Online Educational Settings","authors":"Chinchu Thomas","doi":"10.1145/3242969.3264969","DOIUrl":"https://doi.org/10.1145/3242969.3264969","url":null,"abstract":"Automatic analysis of teacher student interactions is an interesting research problem in social computing. Such interactions happen in both online and class room settings. While teaching effectiveness is the goal in both settings, the mechanism to achieve the same could differ in different settings. In order to characterize these interactions multimodal behavioral signals and language use need to be measured, and a model to predict effectiveness needs to be learnt. These would help characterize the teaching skill of the teacher and level of engagement of students. Also, there could be multiple styles of teaching which can be effective.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131176555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
3rd International Workshop on Multisensory Approaches to Human-Food Interaction 第三届人与食物互动的多感官方法国际研讨会
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3265860
A. Nijholt, Carlos Velasco, Marianna Obrist, K. Okajima, C. Spence
{"title":"3rd International Workshop on Multisensory Approaches to Human-Food Interaction","authors":"A. Nijholt, Carlos Velasco, Marianna Obrist, K. Okajima, C. Spence","doi":"10.1145/3242969.3265860","DOIUrl":"https://doi.org/10.1145/3242969.3265860","url":null,"abstract":"This is the introduction paper to the third version of the workshop on 'Multisensory Approaches to Human-Food Interaction' organized at the 20th ACM International Conference on Multimodal Interaction in Boulder, Colorado, on October 16th, 2018. This workshop is a space where the fast growing research on Multisensory Human-Food Interaction is presented. Here we summarize the workshop's key objectives and contributions.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133712214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Large Vocabulary Continuous Audio-Visual Speech Recognition 大词汇量连续视听语音识别
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264976
George Sterpu
{"title":"Large Vocabulary Continuous Audio-Visual Speech Recognition","authors":"George Sterpu","doi":"10.1145/3242969.3264976","DOIUrl":"https://doi.org/10.1145/3242969.3264976","url":null,"abstract":"We like to conversate with other people using both sounds and visuals, as our perception of speech is bimodal. Essentially echoing the same speech structure, we manage to integrate the two modalities and often understand the message better than with the eyes closed. In this work we would like to learn more about the visual nature of speech, coined lip-reading, and to make use of it towards better automatic speech recognition systems. Recent developments in the Machine Learning area, together with the release of suitable audio-visual datasets aimed at large vocabulary continuous speech recognition, have led to a renewal of the lip-reading topic, and allow us to address the recurring question of how to better integrate visual and acoustic speech.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114624596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
!FTL, an Articulation-Invariant Stroke Gesture Recognizer with Controllable Position, Scale, and Rotation Invariances FTL,一种具有可控位置、比例和旋转不变性的衔接不变笔画手势识别器
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243032
J. Vanderdonckt, Paolo Roselli, J. Pérez-Medina
{"title":"!FTL, an Articulation-Invariant Stroke Gesture Recognizer with Controllable Position, Scale, and Rotation Invariances","authors":"J. Vanderdonckt, Paolo Roselli, J. Pérez-Medina","doi":"10.1145/3242969.3243032","DOIUrl":"https://doi.org/10.1145/3242969.3243032","url":null,"abstract":"Nearest neighbor classifiers recognize stroke gestures by computing a (dis)similarity between a candidate gesture and a training set based on points, which may require normalization, resampling, and rotation to a reference before processing. To eliminate this expensive preprocessing, this paper introduces a vector-between-vectors recognition where a gesture is defined by a vector based on geometric algebra and performs recognition by computing a novel Local Shape Distance (LSD) between vectors. We mathematically prove the LSD position, scale, and rotation invariance, thus eliminating the preprocessing. To demonstrate the viability of this approach, we instantiate LSD for n=2 to compare !FTL, a 2D stroke-gesture recognizer with respect to $1 and $P, two state-of-the-art gesture recognizers, on a gesture set typically used for benchmarking. !FTL benefits from a recognition rate similar to $P, but a significant smaller execution time and a lower algorithmic complexity.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133792605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network 基于四流混合网络的深度模型的群体级情感识别
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264987
Ahmed-Shehab Khan, Zhiyuan Li, Jie Cai, Zibo Meng, James O'Reilly, Yan Tong
{"title":"Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network","authors":"Ahmed-Shehab Khan, Zhiyuan Li, Jie Cai, Zibo Meng, James O'Reilly, Yan Tong","doi":"10.1145/3242969.3264987","DOIUrl":"https://doi.org/10.1145/3242969.3264987","url":null,"abstract":"Group-level Emotion Recognition (GER) in the wild is a challenging task gaining lots of attention. Most recent works utilized two channels of information, a channel involving only faces and a channel containing the whole image, to solve this problem. However, modeling the relationship between faces and scene in a global image remains challenging. In this paper, we proposed a novel face-location aware global network, capturing the face location information in the form of an attention heatmap to better model such relationships. We also proposed a multi-scale face network to infer the group-level emotion from individual faces, which explicitly handles high variance in image and face size, as images in the wild are collected from different sources with different resolutions. In addition, a global blurred stream was developed to explicitly learn and extract the scene-only features. Finally, we proposed a four-stream hybrid network, consisting of the face-location aware global stream, the multi-scale face stream, a global blurred stream, and a global stream, to address the GER task, and showed the effectiveness of our method in GER sub-challenge, a part of the six Emotion Recognition in the Wild (EmotiW 2018) [10] Challenge. The proposed method achieved 65.59% and 78.39% accuracy on the testing and validation sets, respectively, and is ranked the third place on the leaderboard.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124830734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A Multimodal Approach for Predicting Changes in PTSD Symptom Severity 预测PTSD症状严重程度变化的多模式方法
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242981
Adria Mallol-Ragolta, Svati Dhamija, T. Boult
{"title":"A Multimodal Approach for Predicting Changes in PTSD Symptom Severity","authors":"Adria Mallol-Ragolta, Svati Dhamija, T. Boult","doi":"10.1145/3242969.3242981","DOIUrl":"https://doi.org/10.1145/3242969.3242981","url":null,"abstract":"The rising prevalence of mental illnesses is increasing the demand for new digital tools to support mental wellbeing. Numerous collaborations spanning the fields of psychology, machine learning and health are building such tools. Machine-learning models that estimate effects of mental health interventions currently rely on either user self-reports or measurements of user physiology. In this paper, we present a multimodal approach that combines self-reports from questionnaires and skin conductance physiology in a web-based trauma-recovery regime. We evaluate our models on the EASE multimodal dataset and create PTSD symptom severity change estimators at both total and cluster-level. We demonstrate that modeling the PTSD symptom severity change at the total-level with self-reports can be statistically significantly improved by the combination of physiology and self-reports or just skin conductance measurements. Our experiments show that PTSD symptom cluster severity changes using our novel multimodal approach are significantly better modeled than using self-reports and skin conductance alone when extracting skin conductance features from triggers modules for avoidance, negative alterations in cognition & mood and alterations in arousal & reactivity symptoms, while it performs statistically similar for intrusion symptom.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124055981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Understanding Mobile Reading via Camera Based Gaze Tracking and Kinematic Touch Modeling 通过基于相机的凝视跟踪和运动触摸建模来理解移动阅读
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243011
Wei Guo, Jingtao Wang
{"title":"Understanding Mobile Reading via Camera Based Gaze Tracking and Kinematic Touch Modeling","authors":"Wei Guo, Jingtao Wang","doi":"10.1145/3242969.3243011","DOIUrl":"https://doi.org/10.1145/3242969.3243011","url":null,"abstract":"Despite the ubiquity and rapid growth of mobile reading activities, researchers and practitioners today either rely on coarse-grained metrics such as click-through-rate (CTR) and dwell time, or expensive equipment such as gaze trackers to understand users' reading behavior on mobile devices. We present Lepton, an intelligent mobile reading system and a set of dual-channel sensing algorithms to achieve scalable and fine-grained understanding of users' reading behaviors, comprehension, and engagements on unmodified smartphones. Lepton tracks the periodic lateral patterns, i.e. saccade, of users' eye gaze via the front camera, and infers their muscle stiffness during text scrolling via a Mass-Spring-Damper (MSD) based kinematic model from touch events. Through a 25-participant study, we found that both the periodic saccade patterns and muscle stiffness signals captured by Lepton can be used as expressive features to infer users' comprehension and engagement in mobile reading. Overall, our new signals lead to significantly higher performances in predicting users' comprehension (correlation: 0.36 vs. 0.29), concentration (0.36 vs. 0.16), confidence (0.5 vs. 0.47), and engagement (0.34 vs. 0.16) than using traditional dwell-time based features via a user-independent model.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117325659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信