Proceedings of the 2020 International Conference on Multimodal Interaction最新文献_第7页

Extract the Gaze Multi-dimensional Information Analysis Driver Behavior 提取注视多维信息分析驾驶员行为

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417972

Kui Lyu, Minghao Wang, Liyu Meng

{"title":"Extract the Gaze Multi-dimensional Information Analysis Driver Behavior","authors":"Kui Lyu, Minghao Wang, Liyu Meng","doi":"10.1145/3382507.3417972","DOIUrl":"https://doi.org/10.1145/3382507.3417972","url":null,"abstract":"Recent studies has been shown that most traffic accidents are related to the driver's engagement in the driving process. Driver gaze is considered as an important cue to monitor driver distraction. While there has been marked improvement in driver gaze region estimation systems, but there are many challenges exist like cross subject test, perspectives and sensor configuration. In this paper, we propose a Convolutional Neural Networks (CNNs) based multi-model fusion gaze zone estimation systems. Our method mainly consists of two blocks, which implemented the extraction of gaze features based on RGB images and estimation of gaze based on head pose features. Based on the original input image, first general face processing model were used to detect face and localize 3D landmarks, and then extract the most relevant facial information based on it. We implement three face alignment methods to normalize the face information. For the above image-based features, using a multi-input CNN classifier can get reliable classification accuracy. In addition, we design a 2D CNN based PointNet predict the head pose representation by 3D landmarks. Finally, we evaluate our best performance model on the Eighth EmotiW Driver Gaze Prediction sub-challenge test dataset. Our model has a competitive overall accuracy of 81.5144% gaze zone estimation ability on the cross-subject test dataset.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125050975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

You Have a Point There: Object Selection Inside an Automobile Using Gaze, Head Pose and Finger Pointing 你有一个点:对象选择在汽车内使用凝视，头部姿势和手指指向

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418836

Abdul Rafey Aftab, M. V. D. Beeck, M. Feld

{"title":"You Have a Point There: Object Selection Inside an Automobile Using Gaze, Head Pose and Finger Pointing","authors":"Abdul Rafey Aftab, M. V. D. Beeck, M. Feld","doi":"10.1145/3382507.3418836","DOIUrl":"https://doi.org/10.1145/3382507.3418836","url":null,"abstract":"Sophisticated user interaction in the automotive industry is a fast emerging topic. Mid-air gestures and speech already have numerous applications for driver-car interaction. Additionally, multimodal approaches are being developed to leverage the use of multiple sensors for added advantages. In this paper, we propose a fast and practical multimodal fusion method based on machine learning for the selection of various control modules in an automotive vehicle. The modalities taken into account are gaze, head pose and finger pointing gesture. Speech is used only as a trigger for fusion. Single modality has previously been used numerous times for recognition of the user's pointing direction. We, however, demonstrate how multiple inputs can be fused together to enhance the recognition performance. Furthermore, we compare different deep neural network architectures against conventional Machine Learning methods, namely Support Vector Regression and Random Forests, and show the enhancements in the pointing direction accuracy using deep learning. The results suggest a great potential for the use of multimodal inputs that can be applied to more use cases in the vehicle.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121217645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Towards Multimodal Human-Like Characteristics and Expressive Visual Prosody in Virtual Agents 虚拟代理的多模态类人特征与视觉韵律表达

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421155

Mireille Fares

引用次数: 12

Automating Facilitation and Documentation of Collaborative Ideation Processes 协作构思过程的自动化促进和文档化

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421158

Matthias Merk

{"title":"Automating Facilitation and Documentation of Collaborative Ideation Processes","authors":"Matthias Merk","doi":"10.1145/3382507.3421158","DOIUrl":"https://doi.org/10.1145/3382507.3421158","url":null,"abstract":"My research is is in the field of computer supported and enabled innovation processes, in particular focusing on the first phases of ideation in a co-located environment. I'm developing a concept for documenting, tracking and enhancing creative ideation processes. Base of this concept are key figures derived from various system within the ideation sessions. The system designed in my doctoral thesis enables interdisciplinary teams to kick-start creativity by automating facilitation, moderation, creativity support and documentation of the process. Using the example of brainstorming, a standing table is equipped with camera and microphone based sensing as well as multiple ways of interaction and visualization through projection and LED lights. The user interaction with the table is implicit and based on real time metadata generated by the users of the system. System actions are calculated based on what is happening on the table using object recognition. Everything on the table influences the system thus making it into a multimodal input and output device with implicit interaction. While the technical aspects of my research are close to be done, the more problematic part of evaluation will benefit from feedback from the specialists for multimodal interaction at ICMI20.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126456722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bridging Social Sciences and AI for Understanding Child Behaviour 连接社会科学和人工智能来理解儿童行为

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3419745

Heysem Kaya, R. Hessels, M. Najafian, S. Hanekamp, Saeid Safavi

引用次数: 2

Multi-rate Attention Based GRU Model for Engagement Prediction 基于多速率注意力的用户粘性预测GRU模型

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3417965

Bin Zhu, Xinjie Lan, Xin Guo, K. Barner, C. Boncelet

{"title":"Multi-rate Attention Based GRU Model for Engagement Prediction","authors":"Bin Zhu, Xinjie Lan, Xin Guo, K. Barner, C. Boncelet","doi":"10.1145/3382507.3417965","DOIUrl":"https://doi.org/10.1145/3382507.3417965","url":null,"abstract":"Engagement detection is essential in many areas such as driver attention tracking, employee engagement monitoring, and student engagement evaluation. In this paper, we propose a novel approach using attention based hybrid deep models for the 8th Emotion Recognition in the Wild (EmotiW 2020) Grand Challenge in the category of engagement prediction in the wild EMOTIW2020. The task aims to predict the engagement intensity of subjects in videos, and the subjects are students watching educational videos from Massive Open Online Courses (MOOCs). To complete the task, we propose a hybrid deep model based on multi-rate and multi-instance attention. The novelty of the proposed model can be summarized in three aspects: (a) an attention based Gated Recurrent Unit (GRU) deep network, (b) heuristic multi-rate processing on video based data, and (c) a rigorous and accurate ensemble model. Experimental results on the validation set and test set show that our method makes promising improvements, achieving a competitively low MSE of 0.0541 on the test set, improving on the baseline results by 64%. The proposed model won the first place in the engagement prediction in the wild challenge.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131770964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

ROSMI: A Multimodal Corpus for Map-based Instruction-Giving 基于地图的多模态语料库教学

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418861

Miltiadis Marios Katsakioris, Ioannis Konstas, P. Mignotte, Helen F. Hastie

引用次数: 1

MORSE: MultimOdal sentiment analysis for Real-life SEttings 莫尔斯:多模态情感分析的现实生活设置

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3418821

Yiqun Yao, Verónica Pérez-Rosas, M. Abouelenien, Mihai Burzo

{"title":"MORSE: MultimOdal sentiment analysis for Real-life SEttings","authors":"Yiqun Yao, Verónica Pérez-Rosas, M. Abouelenien, Mihai Burzo","doi":"10.1145/3382507.3418821","DOIUrl":"https://doi.org/10.1145/3382507.3418821","url":null,"abstract":"Multimodal sentiment analysis aims to detect and classify sentiment expressed in multimodal data. Research to date has focused on datasets with a large number of training samples, manual transcriptions, and nearly-balanced sentiment labels. However, data collection in real settings often leads to small datasets with noisy transcriptions and imbalanced label distributions, which are therefore significantly more challenging than in controlled settings. In this work, we introduce MORSE, a domain-specific dataset for MultimOdal sentiment analysis in Real-life SEttings. The dataset consists of 2,787 video clips extracted from 49 interviews with panelists in a product usage study, with each clip annotated for positive, negative, or neutral sentiment. The characteristics of MORSE include noisy transcriptions from raw videos, naturally imbalanced label distribution, and scarcity of minority labels. To address the challenging real-life settings in MORSE, we propose a novel two-step fine-tuning method for multimodal sentiment classification using transfer learning and the Transformer model architecture; our method starts with a pre-trained language model and one step of fine-tuning on the language modality, followed by the second step of joint fine-tuning that incorporates the visual and audio modalities. Experimental results show that while MORSE is challenging for various baseline models such as SVM and Transformer, our two-step fine-tuning method is able to capture the dataset characteristics and effectively address the challenges. Our method outperforms related work that uses both single and multiple modalities in the same transfer learning settings.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114988310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Speech, Voice, Text, and Meaning: A Multidisciplinary Approach to Interview Data through the use of digital tools 语音，声音，文本和意义:通过使用数字工具来访问数据的多学科方法

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3420054

A. V. Hessen, S. Calamai, H. V. D. Heuvel, S. Scagliola, N. Karrouche, J. Beeken, Louise Corti, C. Draxler

引用次数: 2

Robot Assisted Diagnosis of Autism in Children 机器人辅助儿童自闭症诊断

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI: 10.1145/3382507.3421162

B. Ashwini

{"title":"Robot Assisted Diagnosis of Autism in Children","authors":"B. Ashwini","doi":"10.1145/3382507.3421162","DOIUrl":"https://doi.org/10.1145/3382507.3421162","url":null,"abstract":"The diagnosis of autism spectrum disorder is cumbersome even for expert clinicians owing to the diversity in the symptoms exhibited by the children which depend on the severity of the disorder. Furthermore, the diagnosis is based on the behavioural observations and the developmental history of the child which has substantial dependence on the perspectives and interpretations of the specialists. In this paper, we present a robot-assisted diagnostic system for the assessment of behavioural symptoms in children for providing a reliable diagnosis. The robotic assistant is intended to support the specialist in administering the diagnostic task, perceiving and evaluating the task outcomes as well as the behavioural cues for assessment of symptoms and diagnosing the state of the child. Despite being used widely in education and intervention for children with autism (CWA), the application of robot assistance in diagnosis is less explored. Further, there have been limited studies addressing the acceptance and effectiveness of robot-assisted interventions for CWA in the Global South. We aim to develop a robot-assisted diagnostic framework for CWA to support the experts and study the viability of such a system in the Indian context.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122353298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2