Proceedings of the 20th ACM International Conference on Multimodal Interaction最新文献

筛选
英文 中文
Estimating Head Motion from Egocentric Vision 从自我中心视觉估计头部运动
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3242982
Satoshi Tsutsui, S. Bambach, David J. Crandall, Chen Yu
{"title":"Estimating Head Motion from Egocentric Vision","authors":"Satoshi Tsutsui, S. Bambach, David J. Crandall, Chen Yu","doi":"10.1145/3242969.3242982","DOIUrl":"https://doi.org/10.1145/3242969.3242982","url":null,"abstract":"The recent availability of lightweight, wearable cameras allows for collecting video data from a \"first-person' perspective, capturing the visual world of the wearer in everyday interactive contexts. In this paper, we investigate how to exploit egocentric vision to infer multimodal behaviors from people wearing head-mounted cameras. More specifically, we estimate head (camera) motion from egocentric video, which can be further used to infer non-verbal behaviors such as head turns and nodding in multimodal interactions. We propose several approaches based on Convolutional Neural Networks (CNNs) that combine raw images and optical flow fields to learn to distinguish regions with optical flow caused by global ego-motion from those caused by other motion in a scene. Our results suggest that CNNs do not directly learn useful visual features with end-to-end training from raw images alone; instead, a better approach is to first extract optical flow explicitly and then train CNNs to integrate optical flow and visual information.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133647544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Data-Driven Approach for Modeling Timing Parameters of American Sign Language 基于数据驱动的美国手语时序参数建模方法
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264965
Sedeeq Al-khazraji
{"title":"Using Data-Driven Approach for Modeling Timing Parameters of American Sign Language","authors":"Sedeeq Al-khazraji","doi":"10.1145/3242969.3264965","DOIUrl":"https://doi.org/10.1145/3242969.3264965","url":null,"abstract":"While many organizations provide a website in multiple languages, few provide a sign-language version for deaf users, many of whom have lower written-language literacy. Rather than providing difficult-to-update videos of humans, a more practical solution would be for the organization to specify a script (representing the sequence of words) to generate a sign-language animation. The challenge is we must select the accurate speed and timing of signs. In this work, focused on American Sign Language (ASL), motion-capture data recorded from humans is used to train machine learning models to calculate realistic timing for ASL animation movement, with an initial focus on inserting prosodic breaks (pauses), adjusting the pause durations for these pauses, and adjusting differentials signing rate for ASL animations based on the sentence syntax and other features. The methodology includes processing and cleaning data from an ASL corpus with motion-capture recordings, selecting features, and building machine learning models to predict where to insert pauses, length of pauses, and signing speed. The resulting models were evaluated using a cross-validation approach to train and test multiple models on various partitions of the dataset, to compare various learning algorithms and subsets of features. In addition, a user-based evaluation was conducted in which native ASL signers evaluated animations generated based on these models. This paper summarizes the motivations for this work, proposed solution, and the potential contribution of this work. This paper describes both completed work and some additional future research plans.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130959848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring the Design of Audio-Kinetic Graphics for Education 教学用声动图形设计的探索
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243004
Annika Muehlbradt, Madhur Atreya, Darren Guinness, Shaun K. Kane
{"title":"Exploring the Design of Audio-Kinetic Graphics for Education","authors":"Annika Muehlbradt, Madhur Atreya, Darren Guinness, Shaun K. Kane","doi":"10.1145/3242969.3243004","DOIUrl":"https://doi.org/10.1145/3242969.3243004","url":null,"abstract":"Creating tactile representations of visual information, especially moving images, is difficult due to a lack of available tactile computing technology and a lack of tools for authoring tactile information. To address these limitations, we developed a software framework that enables educators and other subject experts to create graphical representations that combine audio descriptions with kinetic motion. These audio-kinetic graphics can be played back using off-the-shelf computer hardware. We report on a study in which 10 educators developed content using our framework, and in which 18 people with vision impairments viewed these graphics on our output device. Our findings provide insights on how to translate knowledge of visual information to non-visual formats.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132347495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Attention Network for Engagement Prediction in the Wild 野外参与性预测的注意网络
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264972
A. Kaur
{"title":"Attention Network for Engagement Prediction in the Wild","authors":"A. Kaur","doi":"10.1145/3242969.3264972","DOIUrl":"https://doi.org/10.1145/3242969.3264972","url":null,"abstract":"Analysis of the student engagement in an e-learning environment would facilitate effective task accomplishment and learning. Generally, engagement/disengagement can be estimated from facial expressions, body movements and gaze pattern. The focus of this Ph.D. work is to explore automatic student engagement assessment while watching Massive Open Online Courses (MOOCs) video material in the real-world environment. Most of the work till now in this area has been focusing on engagement assessment in lab-controlled environments. There are several challenges involved in moving from lab-controlled environments to real-world scenarios such as face tracking, illumination, occlusion, and context. The early work in this Ph.D. project explores the student engagement while watching MOOCs. The unavailability of any publicly available dataset in the domain of user engagement motivates to collect dataset in this direction. The dataset contains 195 videos captured from 78 subjects which are about 16.5 hours of recording. This dataset is independently annotated by different labelers and final label is derived from the statistical analysis of the individual labels given by the different annotators. Various traditional machine learning algorithm and deep learning based networks are used to derive baseline of the dataset. Engagement prediction and localization are modeled as Multi-Instance Learning (MIL) problem. In this work, the importance of Hierarchical Attention Network (HAN) is studied. This architecture is motivated from the hierarchical nature of the problem where a video is made up of segments and segments are made up of frames.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125227537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions 基于人脸、场景、骨架和视觉注意的混合深度模型的群体级情感识别
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264990
Xin Guo, Bin Zhu, Luisa F. Polanía, C. Boncelet, K. Barner
{"title":"Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions","authors":"Xin Guo, Bin Zhu, Luisa F. Polanía, C. Boncelet, K. Barner","doi":"10.1145/3242969.3264990","DOIUrl":"https://doi.org/10.1145/3242969.3264990","url":null,"abstract":"This paper presents a hybrid deep learning network submitted to the 6th Emotion Recognition in the Wild (EmotiW 2018) Grand Challenge [9], in the category of group-level emotion recognition. Advanced deep learning models trained individually on faces, scenes, skeletons and salient regions using visual attention mechanisms are fused to classify the emotion of a group of people in an image as positive, neutral or negative. Experimental results show that the proposed hybrid network achieves 78.98% and 68.08% classification accuracy on the validation and testing sets, respectively. These results outperform the baseline of 64% and 61%, and achieved the first place in the challenge.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115352647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
A Generative Approach for Dynamically Varying Photorealistic Facial Expressions in Human-Agent Interactions 人机交互中动态变化逼真面部表情的生成方法
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243031
Yuchi Huang, Saad M. Khan
{"title":"A Generative Approach for Dynamically Varying Photorealistic Facial Expressions in Human-Agent Interactions","authors":"Yuchi Huang, Saad M. Khan","doi":"10.1145/3242969.3243031","DOIUrl":"https://doi.org/10.1145/3242969.3243031","url":null,"abstract":"This paper presents an approach for generating photorealistic video sequences of dynamically varying facial expressions in human-agent interactions. To this end, we study human-human interactions to model the relationship and influence of one individual's facial expressions in the reaction of the other. We introduce a two level optimization of generative adversarial models, wherein the first stage generates a dynamically varying sequence of the agent's face sketch conditioned on facial expression features derived from the interacting human partner. This serves as an intermediate representation, which is used to condition a second stage generative model to synthesize high-quality video of the agent face. Our approach uses a novel L1 regularization term computed from layer features of the discriminator, which are integrated with the generator objective in the GAN model. Session constraints are also imposed on video frame generation to ensure appearance consistency between consecutive frames. We demonstrated that our model is effective at generating visually compelling facial expressions. Moreover, we quantitatively showed that agent facial expressions in the generated video clips reflect valid emotional reactions to behavior of the human partner.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117118494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Improving Object Disambiguation from Natural Language using Empirical Models 利用经验模型改进自然语言对象消歧
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243025
Daniel Prendergast, D. Szafir
{"title":"Improving Object Disambiguation from Natural Language using Empirical Models","authors":"Daniel Prendergast, D. Szafir","doi":"10.1145/3242969.3243025","DOIUrl":"https://doi.org/10.1145/3242969.3243025","url":null,"abstract":"Robots, virtual assistants, and other intelligent agents need to effectively interpret verbal references to environmental objects in order to successfully interact and collaborate with humans in complex tasks. However, object disambiguation can be a challenging task due to ambiguities in natural language. To reduce uncertainty when describing an object, humans often use a combination of unique object features and locative prepositions --prepositional phrases that describe where an object is located relative to other features (i.e., reference objects) in a scene. We present a new system for object disambiguation in cluttered environments based on probabilistic models of unique object features and spatial relationships. Our work extends prior models of spatial relationship semantics by collecting and encoding empirical data from a series of crowdsourced studies to better understand how and when people use locative prepositions, how reference objects are chosen, and how to model prepositional geometry in 3D space (e.g., capturing distinctions between \"next to\" and \"beside\"). Our approach also introduces new techniques for responding to compound locative phrases of arbitrary complexity and proposes a new metric for disambiguation confidence. An experimental validation revealed our method can improve object disambiguation accuracy and performance over past approaches.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116155885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
EyeLinks
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243021
P. Figueirêdo, M. J. Fonseca
{"title":"EyeLinks","authors":"P. Figueirêdo, M. J. Fonseca","doi":"10.1145/3242969.3243021","DOIUrl":"https://doi.org/10.1145/3242969.3243021","url":null,"abstract":"In this paper, we introduce a novel gaze-only interaction technique called EyeLinks, which was designed i) to support various types of discrete clickables (e.g. textual links, buttons, images, tabs, etc.); ii) to be easy to learn and use; iii) to mitigate the inaccuracy of affordable eye trackers. Our technique uses a two-step fixation approach: first, we assign numeric identifiers to clickables in the region where users gaze at and second, users select the desired clickable by performing a fixation on the corresponding confirm button, displayed in a sidebar. This two-step selection enables users to freely explore Web pages, avoids the Midas touch problem and improves accuracy. We evaluated our approach by comparing it against the mouse and another gaze-only technique (Actigaze). The results showed no statistically significant difference between EyeLinks and Actigaze, but users considered EyeLinks easier to learn and use than Actigaze and it was also the most preferred. Of the three, the mouse was the most accurate and efficient technique.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"24 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120937036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction 基于时空特征的深度循环多实例学习用于参与强度预测
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3264981
Jianfei Yang, Kai Wang, Xiaojiang Peng, Y. Qiao
{"title":"Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction","authors":"Jianfei Yang, Kai Wang, Xiaojiang Peng, Y. Qiao","doi":"10.1145/3242969.3264981","DOIUrl":"https://doi.org/10.1145/3242969.3264981","url":null,"abstract":"This paper elaborates the winner approach for engagement intensity prediction in the EmotiW Challenge 2018. The task is to predict the engagement level of a subject when he or she is watching an educational video in diverse conditions and different environments. Our approach formulates the prediction task as a multi-instance regression problem. We divide an input video sequence into segments and calculate the temporal and spatial features of each segment for regressing the intensity. Subject engagement, that is intuitively related with body and face changes in time domain, can be characterized by long short-term memory (LSTM) network. Hence, we build a multi-modal regression model based on multi-instance mechanism as well as LSTM. To make full use of training and validation data, we train different models for different data split and conduct model ensemble finally. Experimental results show that our method achieves mean squared error (MSE) of 0.0717 in the validation set, which improves the baseline results by 28%. Our methods finally win the challenge with MSE of 0.0626 on the testing set.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122618831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
I Smell Trouble: Using Multiple Scents To Convey Driving-Relevant Information 我闻到了麻烦:使用多种气味来传达与驾驶相关的信息
Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI: 10.1145/3242969.3243015
D. Dmitrenko, E. Maggioni, Marianna Obrist
{"title":"I Smell Trouble: Using Multiple Scents To Convey Driving-Relevant Information","authors":"D. Dmitrenko, E. Maggioni, Marianna Obrist","doi":"10.1145/3242969.3243015","DOIUrl":"https://doi.org/10.1145/3242969.3243015","url":null,"abstract":"Cars provide drivers with task-related information (e.g. \"Fill gas\") mainly using visual and auditory stimuli. However, those stimuli may distract or overwhelm the driver, causing unnecessary stress. Here, we propose olfactory stimulation as a novel feedback modality to support the perception of visual notifications, reducing the visual demand of the driver. Based on previous research, we explore the application of the scents of lavender, peppermint, and lemon to convey three driving-relevant messages (i.e. \"Slow down\", \"Short inter-vehicle distance\", \"Lane departure\"). Our paper is the first to demonstrate the application of olfactory conditioning in the context of driving and to explore how multiple olfactory notifications change the driving behaviour. Our findings demonstrate that olfactory notifications are perceived as less distracting, more comfortable, and more helpful than visual notifications. Drivers also make less driving mistakes when exposed to olfactory notifications. We discuss how these findings inform the design of future in-car user interfaces.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124618490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信