Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献_第9页

Estimating Head Motion from Egocentric Vision 从自我中心视觉估计头部运动

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242982

Satoshi Tsutsui, S. Bambach, David J. Crandall, Chen Yu

引用次数: 2

Using Data-Driven Approach for Modeling Timing Parameters of American Sign Language 基于数据驱动的美国手语时序参数建模方法

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264965

Sedeeq Al-khazraji

{"title":"Using Data-Driven Approach for Modeling Timing Parameters of American Sign Language","authors":"Sedeeq Al-khazraji","doi":"10.1145/3242969.3264965","DOIUrl":"https://doi.org/10.1145/3242969.3264965","url":null,"abstract":"While many organizations provide a website in multiple languages, few provide a sign-language version for deaf users, many of whom have lower written-language literacy. Rather than providing difficult-to-update videos of humans, a more practical solution would be for the organization to specify a script (representing the sequence of words) to generate a sign-language animation. The challenge is we must select the accurate speed and timing of signs. In this work, focused on American Sign Language (ASL), motion-capture data recorded from humans is used to train machine learning models to calculate realistic timing for ASL animation movement, with an initial focus on inserting prosodic breaks (pauses), adjusting the pause durations for these pauses, and adjusting differentials signing rate for ASL animations based on the sentence syntax and other features. The methodology includes processing and cleaning data from an ASL corpus with motion-capture recordings, selecting features, and building machine learning models to predict where to insert pauses, length of pauses, and signing speed. The resulting models were evaluated using a cross-validation approach to train and test multiple models on various partitions of the dataset, to compare various learning algorithms and subsets of features. In addition, a user-based evaluation was conducted in which native ASL signers evaluated animations generated based on these models. This paper summarizes the motivations for this work, proposed solution, and the potential contribution of this work. This paper describes both completed work and some additional future research plans.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130959848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Exploring the Design of Audio-Kinetic Graphics for Education 教学用声动图形设计的探索

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243004

Annika Muehlbradt, Madhur Atreya, Darren Guinness, Shaun K. Kane

引用次数: 7

Attention Network for Engagement Prediction in the Wild 野外参与性预测的注意网络

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264972

A. Kaur

{"title":"Attention Network for Engagement Prediction in the Wild","authors":"A. Kaur","doi":"10.1145/3242969.3264972","DOIUrl":"https://doi.org/10.1145/3242969.3264972","url":null,"abstract":"Analysis of the student engagement in an e-learning environment would facilitate effective task accomplishment and learning. Generally, engagement/disengagement can be estimated from facial expressions, body movements and gaze pattern. The focus of this Ph.D. work is to explore automatic student engagement assessment while watching Massive Open Online Courses (MOOCs) video material in the real-world environment. Most of the work till now in this area has been focusing on engagement assessment in lab-controlled environments. There are several challenges involved in moving from lab-controlled environments to real-world scenarios such as face tracking, illumination, occlusion, and context. The early work in this Ph.D. project explores the student engagement while watching MOOCs. The unavailability of any publicly available dataset in the domain of user engagement motivates to collect dataset in this direction. The dataset contains 195 videos captured from 78 subjects which are about 16.5 hours of recording. This dataset is independently annotated by different labelers and final label is derived from the statistical analysis of the individual labels given by the different annotators. Various traditional machine learning algorithm and deep learning based networks are used to derive baseline of the dataset. Engagement prediction and localization are modeled as Multi-Instance Learning (MIL) problem. In this work, the importance of Hierarchical Attention Network (HAN) is studied. This architecture is motivated from the hierarchical nature of the problem where a video is made up of segments and segments are made up of frames.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125227537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions 基于人脸、场景、骨架和视觉注意的混合深度模型的群体级情感识别

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264990

Xin Guo, Bin Zhu, Luisa F. Polanía, C. Boncelet, K. Barner

引用次数: 30

A Generative Approach for Dynamically Varying Photorealistic Facial Expressions in Human-Agent Interactions 人机交互中动态变化逼真面部表情的生成方法

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243031

Yuchi Huang, Saad M. Khan

{"title":"A Generative Approach for Dynamically Varying Photorealistic Facial Expressions in Human-Agent Interactions","authors":"Yuchi Huang, Saad M. Khan","doi":"10.1145/3242969.3243031","DOIUrl":"https://doi.org/10.1145/3242969.3243031","url":null,"abstract":"This paper presents an approach for generating photorealistic video sequences of dynamically varying facial expressions in human-agent interactions. To this end, we study human-human interactions to model the relationship and influence of one individual's facial expressions in the reaction of the other. We introduce a two level optimization of generative adversarial models, wherein the first stage generates a dynamically varying sequence of the agent's face sketch conditioned on facial expression features derived from the interacting human partner. This serves as an intermediate representation, which is used to condition a second stage generative model to synthesize high-quality video of the agent face. Our approach uses a novel L1 regularization term computed from layer features of the discriminator, which are integrated with the generator objective in the GAN model. Session constraints are also imposed on video frame generation to ensure appearance consistency between consecutive frames. We demonstrated that our model is effective at generating visually compelling facial expressions. Moreover, we quantitatively showed that agent facial expressions in the generated video clips reflect valid emotional reactions to behavior of the human partner.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117118494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Improving Object Disambiguation from Natural Language using Empirical Models 利用经验模型改进自然语言对象消歧

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243025

Daniel Prendergast, D. Szafir

{"title":"Improving Object Disambiguation from Natural Language using Empirical Models","authors":"Daniel Prendergast, D. Szafir","doi":"10.1145/3242969.3243025","DOIUrl":"https://doi.org/10.1145/3242969.3243025","url":null,"abstract":"Robots, virtual assistants, and other intelligent agents need to effectively interpret verbal references to environmental objects in order to successfully interact and collaborate with humans in complex tasks. However, object disambiguation can be a challenging task due to ambiguities in natural language. To reduce uncertainty when describing an object, humans often use a combination of unique object features and locative prepositions --prepositional phrases that describe where an object is located relative to other features (i.e., reference objects) in a scene. We present a new system for object disambiguation in cluttered environments based on probabilistic models of unique object features and spatial relationships. Our work extends prior models of spatial relationship semantics by collecting and encoding empirical data from a series of crowdsourced studies to better understand how and when people use locative prepositions, how reference objects are chosen, and how to model prepositional geometry in 3D space (e.g., capturing distinctions between \"next to\" and \"beside\"). Our approach also introduces new techniques for responding to compound locative phrases of arbitrary complexity and proposes a new metric for disambiguation confidence. An experimental validation revealed our method can improve object disambiguation accuracy and performance over past approaches.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116155885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

EyeLinks

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243021

P. Figueirêdo, M. J. Fonseca

引用次数: 3

Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction 基于时空特征的深度循环多实例学习用于参与强度预测

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264981

Jianfei Yang, Kai Wang, Xiaojiang Peng, Y. Qiao

{"title":"Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction","authors":"Jianfei Yang, Kai Wang, Xiaojiang Peng, Y. Qiao","doi":"10.1145/3242969.3264981","DOIUrl":"https://doi.org/10.1145/3242969.3264981","url":null,"abstract":"This paper elaborates the winner approach for engagement intensity prediction in the EmotiW Challenge 2018. The task is to predict the engagement level of a subject when he or she is watching an educational video in diverse conditions and different environments. Our approach formulates the prediction task as a multi-instance regression problem. We divide an input video sequence into segments and calculate the temporal and spatial features of each segment for regressing the intensity. Subject engagement, that is intuitively related with body and face changes in time domain, can be characterized by long short-term memory (LSTM) network. Hence, we build a multi-modal regression model based on multi-instance mechanism as well as LSTM. To make full use of training and validation data, we train different models for different data split and conduct model ensemble finally. Experimental results show that our method achieves mean squared error (MSE) of 0.0717 in the validation set, which improves the baseline results by 28%. Our methods finally win the challenge with MSE of 0.0626 on the testing set.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122618831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

I Smell Trouble: Using Multiple Scents To Convey Driving-Relevant Information 我闻到了麻烦:使用多种气味来传达与驾驶相关的信息

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243015

D. Dmitrenko, E. Maggioni, Marianna Obrist

引用次数: 28