Proceedings of the 2020 International Conference on Multimodal Interaction最新文献

筛选
英文 中文
Extract the Gaze Multi-dimensional Information Analysis Driver Behavior 提取注视多维信息分析驾驶员行为
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417972
Kui Lyu, Minghao Wang, Liyu Meng
{"title":"Extract the Gaze Multi-dimensional Information Analysis Driver Behavior","authors":"Kui Lyu, Minghao Wang, Liyu Meng","doi":"10.1145/3382507.3417972","DOIUrl":"https://doi.org/10.1145/3382507.3417972","url":null,"abstract":"Recent studies has been shown that most traffic accidents are related to the driver's engagement in the driving process. Driver gaze is considered as an important cue to monitor driver distraction. While there has been marked improvement in driver gaze region estimation systems, but there are many challenges exist like cross subject test, perspectives and sensor configuration. In this paper, we propose a Convolutional Neural Networks (CNNs) based multi-model fusion gaze zone estimation systems. Our method mainly consists of two blocks, which implemented the extraction of gaze features based on RGB images and estimation of gaze based on head pose features. Based on the original input image, first general face processing model were used to detect face and localize 3D landmarks, and then extract the most relevant facial information based on it. We implement three face alignment methods to normalize the face information. For the above image-based features, using a multi-input CNN classifier can get reliable classification accuracy. In addition, we design a 2D CNN based PointNet predict the head pose representation by 3D landmarks. Finally, we evaluate our best performance model on the Eighth EmotiW Driver Gaze Prediction sub-challenge test dataset. Our model has a competitive overall accuracy of 81.5144% gaze zone estimation ability on the cross-subject test dataset.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125050975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
You Have a Point There: Object Selection Inside an Automobile Using Gaze, Head Pose and Finger Pointing 你有一个点:对象选择在汽车内使用凝视,头部姿势和手指指向
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418836
Abdul Rafey Aftab, M. V. D. Beeck, M. Feld
{"title":"You Have a Point There: Object Selection Inside an Automobile Using Gaze, Head Pose and Finger Pointing","authors":"Abdul Rafey Aftab, M. V. D. Beeck, M. Feld","doi":"10.1145/3382507.3418836","DOIUrl":"https://doi.org/10.1145/3382507.3418836","url":null,"abstract":"Sophisticated user interaction in the automotive industry is a fast emerging topic. Mid-air gestures and speech already have numerous applications for driver-car interaction. Additionally, multimodal approaches are being developed to leverage the use of multiple sensors for added advantages. In this paper, we propose a fast and practical multimodal fusion method based on machine learning for the selection of various control modules in an automotive vehicle. The modalities taken into account are gaze, head pose and finger pointing gesture. Speech is used only as a trigger for fusion. Single modality has previously been used numerous times for recognition of the user's pointing direction. We, however, demonstrate how multiple inputs can be fused together to enhance the recognition performance. Furthermore, we compare different deep neural network architectures against conventional Machine Learning methods, namely Support Vector Regression and Random Forests, and show the enhancements in the pointing direction accuracy using deep learning. The results suggest a great potential for the use of multimodal inputs that can be applied to more use cases in the vehicle.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121217645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Towards Multimodal Human-Like Characteristics and Expressive Visual Prosody in Virtual Agents 虚拟代理的多模态类人特征与视觉韵律表达
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421155
Mireille Fares
{"title":"Towards Multimodal Human-Like Characteristics and Expressive Visual Prosody in Virtual Agents","authors":"Mireille Fares","doi":"10.1145/3382507.3421155","DOIUrl":"https://doi.org/10.1145/3382507.3421155","url":null,"abstract":"One of the key challenges in designing Embodied Conversational Agents (ECA) is to produce human-like gestural and visual prosody expressivity. Another major challenge is to maintain the interlocutor's attention by adapting the agent's behavior to the interlocutor's multimodal behavior. This paper outlines my PhD research plan that aims to develop convincing expressive and natural behavior in ECAs and to explore and model the mechanisms that govern human-agent multimodal interaction. Additionally, I describe in this paper my first PhD milestone which focuses on developing an end-to-end LSTM Neural Network model for upper-face gestures generation. The main task consists of building a model that can produce expressive and coherent upper-face gestures while considering multiple modalities: speech audio, text, and action units.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124122641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Automating Facilitation and Documentation of Collaborative Ideation Processes 协作构思过程的自动化促进和文档化
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421158
Matthias Merk
{"title":"Automating Facilitation and Documentation of Collaborative Ideation Processes","authors":"Matthias Merk","doi":"10.1145/3382507.3421158","DOIUrl":"https://doi.org/10.1145/3382507.3421158","url":null,"abstract":"My research is is in the field of computer supported and enabled innovation processes, in particular focusing on the first phases of ideation in a co-located environment. I'm developing a concept for documenting, tracking and enhancing creative ideation processes. Base of this concept are key figures derived from various system within the ideation sessions. The system designed in my doctoral thesis enables interdisciplinary teams to kick-start creativity by automating facilitation, moderation, creativity support and documentation of the process. Using the example of brainstorming, a standing table is equipped with camera and microphone based sensing as well as multiple ways of interaction and visualization through projection and LED lights. The user interaction with the table is implicit and based on real time metadata generated by the users of the system. System actions are calculated based on what is happening on the table using object recognition. Everything on the table influences the system thus making it into a multimodal input and output device with implicit interaction. While the technical aspects of my research are close to be done, the more problematic part of evaluation will benefit from feedback from the specialists for multimodal interaction at ICMI20.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126456722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging Social Sciences and AI for Understanding Child Behaviour 连接社会科学和人工智能来理解儿童行为
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3419745
Heysem Kaya, R. Hessels, M. Najafian, S. Hanekamp, Saeid Safavi
{"title":"Bridging Social Sciences and AI for Understanding Child Behaviour","authors":"Heysem Kaya, R. Hessels, M. Najafian, S. Hanekamp, Saeid Safavi","doi":"10.1145/3382507.3419745","DOIUrl":"https://doi.org/10.1145/3382507.3419745","url":null,"abstract":"Child behaviour is a topic of wide scientific interest among many different disciplines, including social and behavioural sciences and artificial intelligence (AI). In this workshop, we aimed to connect researchers from these fields to address topics such as the usage of AI to better understand and model child behavioural and developmental processes, challenges and opportunities for AI in large-scale child behaviour analysis and implementing explainable ML/AI on sensitive child data. The workshop served as a successful first step towards this goal and attracted contributions from different research disciplines on the analysis of child behaviour. This paper provides a summary of the activities of the workshop and the accepted papers and abstracts.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126465489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-rate Attention Based GRU Model for Engagement Prediction 基于多速率注意力的用户粘性预测GRU模型
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417965
Bin Zhu, Xinjie Lan, Xin Guo, K. Barner, C. Boncelet
{"title":"Multi-rate Attention Based GRU Model for Engagement Prediction","authors":"Bin Zhu, Xinjie Lan, Xin Guo, K. Barner, C. Boncelet","doi":"10.1145/3382507.3417965","DOIUrl":"https://doi.org/10.1145/3382507.3417965","url":null,"abstract":"Engagement detection is essential in many areas such as driver attention tracking, employee engagement monitoring, and student engagement evaluation. In this paper, we propose a novel approach using attention based hybrid deep models for the 8th Emotion Recognition in the Wild (EmotiW 2020) Grand Challenge in the category of engagement prediction in the wild EMOTIW2020. The task aims to predict the engagement intensity of subjects in videos, and the subjects are students watching educational videos from Massive Open Online Courses (MOOCs). To complete the task, we propose a hybrid deep model based on multi-rate and multi-instance attention. The novelty of the proposed model can be summarized in three aspects: (a) an attention based Gated Recurrent Unit (GRU) deep network, (b) heuristic multi-rate processing on video based data, and (c) a rigorous and accurate ensemble model. Experimental results on the validation set and test set show that our method makes promising improvements, achieving a competitively low MSE of 0.0541 on the test set, improving on the baseline results by 64%. The proposed model won the first place in the engagement prediction in the wild challenge.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131770964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
ROSMI: A Multimodal Corpus for Map-based Instruction-Giving 基于地图的多模态语料库教学
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418861
Miltiadis Marios Katsakioris, Ioannis Konstas, P. Mignotte, Helen F. Hastie
{"title":"ROSMI: A Multimodal Corpus for Map-based Instruction-Giving","authors":"Miltiadis Marios Katsakioris, Ioannis Konstas, P. Mignotte, Helen F. Hastie","doi":"10.1145/3382507.3418861","DOIUrl":"https://doi.org/10.1145/3382507.3418861","url":null,"abstract":"We present the publicly-available Robot Open Street Map Instructions (ROSMI) corpus: a rich multimodal dataset of map and natural language instruction pairs that was collected via crowdsourcing. The goal of this corpus is to aid in the advancement of state-of-the-art visual-dialogue tasks, including reference resolution and robot-instruction understanding. The domain described here concerns robots and autonomous systems being used for inspection and emergency response. The ROSMI corpus is unique in that it captures interaction grounded in map-based visual stimuli that is both human-readable but also contains rich metadata that is needed to plan and deploy robots and autonomous systems, thus facilitating human-robot teaming.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125641220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MORSE: MultimOdal sentiment analysis for Real-life SEttings 莫尔斯:多模态情感分析的现实生活设置
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418821
Yiqun Yao, Verónica Pérez-Rosas, M. Abouelenien, Mihai Burzo
{"title":"MORSE: MultimOdal sentiment analysis for Real-life SEttings","authors":"Yiqun Yao, Verónica Pérez-Rosas, M. Abouelenien, Mihai Burzo","doi":"10.1145/3382507.3418821","DOIUrl":"https://doi.org/10.1145/3382507.3418821","url":null,"abstract":"Multimodal sentiment analysis aims to detect and classify sentiment expressed in multimodal data. Research to date has focused on datasets with a large number of training samples, manual transcriptions, and nearly-balanced sentiment labels. However, data collection in real settings often leads to small datasets with noisy transcriptions and imbalanced label distributions, which are therefore significantly more challenging than in controlled settings. In this work, we introduce MORSE, a domain-specific dataset for MultimOdal sentiment analysis in Real-life SEttings. The dataset consists of 2,787 video clips extracted from 49 interviews with panelists in a product usage study, with each clip annotated for positive, negative, or neutral sentiment. The characteristics of MORSE include noisy transcriptions from raw videos, naturally imbalanced label distribution, and scarcity of minority labels. To address the challenging real-life settings in MORSE, we propose a novel two-step fine-tuning method for multimodal sentiment classification using transfer learning and the Transformer model architecture; our method starts with a pre-trained language model and one step of fine-tuning on the language modality, followed by the second step of joint fine-tuning that incorporates the visual and audio modalities. Experimental results show that while MORSE is challenging for various baseline models such as SVM and Transformer, our two-step fine-tuning method is able to capture the dataset characteristics and effectively address the challenges. Our method outperforms related work that uses both single and multiple modalities in the same transfer learning settings.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114988310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Speech, Voice, Text, and Meaning: A Multidisciplinary Approach to Interview Data through the use of digital tools 语音,声音,文本和意义:通过使用数字工具来访问数据的多学科方法
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3420054
A. V. Hessen, S. Calamai, H. V. D. Heuvel, S. Scagliola, N. Karrouche, J. Beeken, Louise Corti, C. Draxler
{"title":"Speech, Voice, Text, and Meaning: A Multidisciplinary Approach to Interview Data through the use of digital tools","authors":"A. V. Hessen, S. Calamai, H. V. D. Heuvel, S. Scagliola, N. Karrouche, J. Beeken, Louise Corti, C. Draxler","doi":"10.1145/3382507.3420054","DOIUrl":"https://doi.org/10.1145/3382507.3420054","url":null,"abstract":"Interview data is multimodal data: it consists of speech sound, facial expression and gestures, captured in a particular situation, and containing textual information and emotion. This workshop shows how a multidisciplinary approach may exploit the full potential of interview data. The workshop first gives a systematic overview of the research fields working with interview data. It then presents the speech technology currently available to support transcribing and annotating interview data, such as automatic speech recognition, speaker diarization, and emotion detection. Finally, scholars who work with interview data and tools may present their work and discover how to make use of existing technology.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robot Assisted Diagnosis of Autism in Children 机器人辅助儿童自闭症诊断
Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421162
B. Ashwini
{"title":"Robot Assisted Diagnosis of Autism in Children","authors":"B. Ashwini","doi":"10.1145/3382507.3421162","DOIUrl":"https://doi.org/10.1145/3382507.3421162","url":null,"abstract":"The diagnosis of autism spectrum disorder is cumbersome even for expert clinicians owing to the diversity in the symptoms exhibited by the children which depend on the severity of the disorder. Furthermore, the diagnosis is based on the behavioural observations and the developmental history of the child which has substantial dependence on the perspectives and interpretations of the specialists. In this paper, we present a robot-assisted diagnostic system for the assessment of behavioural symptoms in children for providing a reliable diagnosis. The robotic assistant is intended to support the specialist in administering the diagnostic task, perceiving and evaluating the task outcomes as well as the behavioural cues for assessment of symptoms and diagnosing the state of the child. Despite being used widely in education and intervention for children with autism (CWA), the application of robot assistance in diagnosis is less explored. Further, there have been limited studies addressing the acceptance and effectiveness of robot-assisted interventions for CWA in the Global South. We aim to develop a robot-assisted diagnostic framework for CWA to support the experts and study the viability of such a system in the Indian context.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122353298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信