Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献

Exploring A New Method for Food Likability Rating Based on DT-CWT Theory 基于DT-CWT理论的食物喜爱度评定新方法探索

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243684

Yanan Guo, Jing Han, Zixing Zhang, Björn Schuller, Yide Ma

引用次数: 3

Smart Arse: Posture Classification with Textile Sensors in Trousers 智能屁股:裤子上的纺织品传感器的姿势分类

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242977

Sophie Skach, R. Stewart, P. Healey

引用次数: 16

Human-Habitat for Health (H3): Human-habitat Multimodal Interaction for Promoting Health and Well-being in the Internet of Things Era 人栖健康(H3):物联网时代促进健康与福祉的人栖多模态交互

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3265862

Theodora Chaspari, A. Metallinou, L. S. Duker, A. Behzadan

引用次数: 8

Multimodal Modeling of Coordination and Coregulation Patterns in Speech Rate during Triadic Collaborative Problem Solving 三元协同问题解决中语速协调与协同调节模式的多模态建模

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242989

Angela E. B. Stewart, Z. Keirn, S. D’Mello

{"title":"Multimodal Modeling of Coordination and Coregulation Patterns in Speech Rate during Triadic Collaborative Problem Solving","authors":"Angela E. B. Stewart, Z. Keirn, S. D’Mello","doi":"10.1145/3242969.3242989","DOIUrl":"https://doi.org/10.1145/3242969.3242989","url":null,"abstract":"We model coordination and coregulation patterns in 33 triads engaged in collaboratively solving a challenging computer programming task for approximately 20 minutes. Our goal is to prospectively model speech rate (words/sec) - an important signal of turn taking and active participation - of one teammate (A or B or C) from time lagged nonverbal signals (speech rate and acoustic-prosodic features) of the other two (i.e., A + B → C; A + C → B; B + C → A) and task-related context features. We trained feed-forward neural networks (FFNNs) and long short-term memory recurrent neural networks (LSTMs) using group-level nested cross-validation. LSTMs outperformed FFNNs and a chance baseline and could predict speech rate up to 6s into the future. A multimodal combination of speech rate, acoustic-prosodic, and task context features outperformed unimodal and bimodal signals. The extent to which the models could predict an individual's speech rate was positively related to that individual's scores on a subsequent posttest, suggesting a link between coordination/coregulation and collaborative learning outcomes. We discuss applications of the models for real-time systems that monitor the collaborative process and intervene to promote positive collaborative outcomes.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114351994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Interpretable Multimodal Deception Detection in Videos 视频中可解释的多模态欺骗检测

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264967

Hamid Karimi

引用次数: 7

Hand, Foot or Voice: Alternative Input Modalities for Touchless Interaction in the Medical Domain 手、脚或声音:医疗领域非接触交互的替代输入方式

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242971

Benjamin Hatscher, C. Hansen

{"title":"Hand, Foot or Voice: Alternative Input Modalities for Touchless Interaction in the Medical Domain","authors":"Benjamin Hatscher, C. Hansen","doi":"10.1145/3242969.3242971","DOIUrl":"https://doi.org/10.1145/3242969.3242971","url":null,"abstract":"During medical interventions, direct interaction with medical image data is a cumbersome task for physicians due to the sterile environment. Even though touchless input via hand, foot or voice is possible, these modalities are not available for these tasks all the time. Therefore, we investigated touchless input methods as alternatives to each other with focus on two common interaction tasks in sterile settings: activation of a system to avoid unintentional input and manipulation of continuous values. We created a system where activation could be achieved via voice, hand or foot gestures and continuous manipulation via hand and foot gestures. We conducted a comparative user study and found that foot interaction performed best in terms of task completion times and scored highest in the subjectively assessed measures usability and usefulness. Usability and usefulness scores for hand and voice were only slightly worse and all participants were able to perform all tasks in a sufficient short amount of time. This work contributes by proposing methods to interact with computers in sterile, dynamic environments and by providing evaluation results for direct comparison of alternative modalities for common interaction tasks.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127881205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Modeling Empathy in Embodied Conversational Agents: Extended Abstract 具身会话代理的共情建模:扩展摘要

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264977

Ö. Yalçın

引用次数: 4

Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks 基于深度卷积神经网络的多人会议视觉焦点估计

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242973

K. Otsuka, Keisuke Kasuga, Martina Köhler

{"title":"Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks","authors":"K. Otsuka, Keisuke Kasuga, Martina Köhler","doi":"10.1145/3242969.3242973","DOIUrl":"https://doi.org/10.1145/3242969.3242973","url":null,"abstract":"Convolutional neural networks (CNNs) are employed to estimate the visual focus of attention (VFoA), also called gaze direction , in multiparty face-to-face meetings on the basis of multimodal nonverbal behaviors including head pose, direction of the eyeball, and presence/absence of utterance. To reveal the potential of CNNs, we focus on aspects of multimodal and multiparty fusion including individual/group models, early/late fusion, and robustness when using inputs from image-based trackers. In contrast to the individual model that separately targets each person specific to one's seat, the group model aims to jointly estimate the gaze directions of all participants. Experiments confirmed that the group model outperformed the individual model especially in predicting listeners' VFoA when the inputs did not include eyeball directions. This result indicates that the group CNN model can implicitly learn underlying conversation structures, e.g., the listeners' gazes converge on the speaker. When the eyeball direction feature is available, both models outperformed the Bayes models used for comparison. In this case, the individual model was superior to the group model, particularly in estimating the speaker's VFoA. Moreover, it was revealed that in group models, two-stage late fusion, which integrates an individual features first, and multiparty features second, outperformed other structures. Furthermore, our experiment confirmed that image-based tracking can provide a comparable level of performance to that of sensor-based measurements. Overall, the results suggest that the CNN is a promising approach for VFoA estimation.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130849081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Using Interlocutor-Modulated Attention BLSTM to Predict Personality Traits in Small Group Interaction 利用对话者调节注意BLSTM预测小团体互动中的人格特质

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243001

Yun-Shao Lin, Chi-Chun Lee

{"title":"Using Interlocutor-Modulated Attention BLSTM to Predict Personality Traits in Small Group Interaction","authors":"Yun-Shao Lin, Chi-Chun Lee","doi":"10.1145/3242969.3243001","DOIUrl":"https://doi.org/10.1145/3242969.3243001","url":null,"abstract":"Small group interaction occurs often in workplace and education settings. Its dynamic progression is an essential factor in dictating the final group performance outcomes. The personality of each individual within the group is reflected in his/her interpersonal behaviors with other members of the group as they engage in these task-oriented interactions. In this work, we propose an interlocutor-modulated attention BSLTM (IM-aBLSTM) architecture that models an individual's vocal behaviors during small group interactions in order to automatically infer his/her personality traits. The interlocutor-modulated attention mechanism jointly optimize the relevant interpersonal vocal behaviors of other members of group during interactions. In specifics, we evaluate our proposed IM-aBLSTM in one of the largest small group interaction database, the ELEA corpus. Our framework achieves a promising unweighted recall accuracy of 87.9% in ten different binary personality trait prediction tasks, which outperforms the best results previously reported on the same database by 10.4% absolute. Finally, by analyzing the interpersonal vocal behaviors in the region of high attention weights, we observe several distinct intra- and inter-personal vocal behavior patterns that vary as a function of personality traits.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133581533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

An Ensemble Model Using Face and Body Tracking for Engagement Detection 基于面部和身体跟踪的交战检测集成模型

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264986

Cheng Chang, Cheng Zhang, L. Chen, Yang Liu

引用次数: 41