Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

筛选
英文 中文
Automatic Detection of Mind Wandering During Reading Using Gaze and Physiology 用凝视和生理学自动检测阅读时的走神
R. Bixler, Nathaniel Blanchard, L. Garrison, S. D’Mello
{"title":"Automatic Detection of Mind Wandering During Reading Using Gaze and Physiology","authors":"R. Bixler, Nathaniel Blanchard, L. Garrison, S. D’Mello","doi":"10.1145/2818346.2820742","DOIUrl":"https://doi.org/10.1145/2818346.2820742","url":null,"abstract":"Mind wandering (MW) entails an involuntary shift in attention from task-related thoughts to task-unrelated thoughts, and has been shown to have detrimental effects on performance in a number of contexts. This paper proposes an automated multimodal detector of MW using eye gaze and physiology (skin conductance and skin temperature) and aspects of the context (e.g., time on task, task difficulty). Data in the form of eye gaze and physiological signals were collected as 178 participants read four instructional texts from a computer interface. Participants periodically provided self-reports of MW in response to pseudorandom auditory probes during reading. Supervised machine learning models trained on features extracted from participants' gaze fixations, physiological signals, and contextual cues were used to detect pages where participants provided positive responses of MW to the auditory probes. Two methods of combining gaze and physiology features were explored. Feature level fusion entailed building a single model by combining feature vectors from individual modalities. Decision level fusion entailed building individual models for each modality and adjudicating amongst individual decisions. Feature level fusion resulted in an 11% improvement in classification accuracy over the best unimodal model, but there was no comparable improvement for decision level fusion. This was reflected by a small improvement in both precision and recall. An analysis of the features indicated that MW was associated with fewer and longer fixations and saccades, and a higher more deterministic skin temperature. Possible applications of the detector are discussed.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76362634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Multimodal Affect Detection in the Wild: Accuracy, Availability, and Generalizability 野外多模态情感检测:准确性、可用性和可泛化性
Nigel Bosch
{"title":"Multimodal Affect Detection in the Wild: Accuracy, Availability, and Generalizability","authors":"Nigel Bosch","doi":"10.1145/2818346.2823316","DOIUrl":"https://doi.org/10.1145/2818346.2823316","url":null,"abstract":"Affect detection is an important component of computerized learning environments that adapt the interface and materials to students' affect. This paper proposes a plan for developing and testing multimodal affect detectors that generalize across differences in data that are likely to occur in practical applications (e.g., time, demographic variables). Facial features and interaction log features are considered as modalities for affect detection in this scenario, each with their own advantages. Results are presented for completed work evaluating the accuracy of individual modality face- and interaction- based detectors, accuracy and availability of a multimodal combination of these modalities, and initial steps toward generalization of face-based detectors. Additional data collection needed for cross-culture generalization testing is also completed. Challenges and possible solutions for proposed cross-cultural generalization testing of multimodal detectors are also discussed.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80808640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Who's Speaking?: Audio-Supervised Classification of Active Speakers in Video 说话的是谁?视频中主动说话者的音频监督分类
Punarjay Chakravarty, S. Mirzaei, T. Tuytelaars, H. V. hamme
{"title":"Who's Speaking?: Audio-Supervised Classification of Active Speakers in Video","authors":"Punarjay Chakravarty, S. Mirzaei, T. Tuytelaars, H. V. hamme","doi":"10.1145/2818346.2820780","DOIUrl":"https://doi.org/10.1145/2818346.2820780","url":null,"abstract":"Active speakers have traditionally been identified in video by detecting their moving lips. This paper demonstrates the same using spatio-temporal features that aim to capture other cues: movement of the head, upper body and hands of active speakers. Speaker directional information, obtained using sound source localization from a microphone array is used to supervise the training of these video features.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85668989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding 利用移动应用程序的行为模式实现个性化口语理解
Yun-Nung (Vivian) Chen, Ming Sun, Alexander I. Rudnicky, A. Gershman
{"title":"Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding","authors":"Yun-Nung (Vivian) Chen, Ming Sun, Alexander I. Rudnicky, A. Gershman","doi":"10.1145/2818346.2820781","DOIUrl":"https://doi.org/10.1145/2818346.2820781","url":null,"abstract":"Spoken language interfaces are appearing in various smart devices (e.g. smart-phones, smart-TV, in-car navigating systems) and serve as intelligent assistants (IAs). However, most of them do not consider individual users' behavioral profiles and contexts when modeling user intents. Such behavioral patterns are user-specific and provide useful cues to improve spoken language understanding (SLU). This paper focuses on leveraging the app behavior history to improve spoken dialog systems performance. We developed a matrix factorization approach that models speech and app usage patterns to predict user intents (e.g. launching a specific app). We collected multi-turn interactions in a WoZ scenario; users were asked to reproduce the multi-app tasks that they had performed earlier on their smart-phones. By modeling latent semantics behind lexical and behavioral patterns, the proposed multi-model system achieves about 52% of turn accuracy for intent prediction on ASR transcripts.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84886857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Recurrent Neural Networks for Emotion Recognition in Video 视频中情感识别的递归神经网络
S. Kahou, Vincent Michalski, K. Konda, R. Memisevic, C. Pal
{"title":"Recurrent Neural Networks for Emotion Recognition in Video","authors":"S. Kahou, Vincent Michalski, K. Konda, R. Memisevic, C. Pal","doi":"10.1145/2818346.2830596","DOIUrl":"https://doi.org/10.1145/2818346.2830596","url":null,"abstract":"Deep learning based approaches to facial analysis and video analysis have recently demonstrated high performance on a variety of key tasks such as face recognition, emotion recognition and activity recognition. In the case of video, information often must be aggregated across a variable length sequence of frames to produce a classification result. Prior work using convolutional neural networks (CNNs) for emotion recognition in video has relied on temporal averaging and pooling operations reminiscent of widely used approaches for the spatial aggregation of information. Recurrent neural networks (RNNs) have seen an explosion of recent interest as they yield state-of-the-art performance on a variety of sequence analysis tasks. RNNs provide an attractive framework for propagating information over a sequence using a continuous valued hidden layer representation. In this work we present a complete system for the 2015 Emotion Recognition in the Wild (EmotiW) Challenge. We focus our presentation and experimental analysis on a hybrid CNN-RNN architecture for facial expression analysis that can outperform a previously applied CNN approach using temporal averaging for aggregation.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74090351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 321
Detecting Mastication: A Wearable Approach 检测咀嚼:一种可穿戴的方法
Abdelkareem Bedri, Apoorva Verlekar, Edison Thomaz, Valerie Avva, Thad Starner
{"title":"Detecting Mastication: A Wearable Approach","authors":"Abdelkareem Bedri, Apoorva Verlekar, Edison Thomaz, Valerie Avva, Thad Starner","doi":"10.1145/2818346.2820767","DOIUrl":"https://doi.org/10.1145/2818346.2820767","url":null,"abstract":"We explore using the Outer Ear Interface (OEI) to recognize eating activities. OEI contains a 3D gyroscope and a set of proximity sensors encapsulated in an off-the-shelf earpiece to monitor jaw movement by measuring ear canal deformation. In a laboratory setting with 20 participants, OEI could distinguish eating from other activities, such as walking, talking, and silently reading, with over 90% accuracy (user independent). In a second study, six subjects wore the system for 6 hours each while performing their normal daily activities. OEI correctly classified five minute segments of time as eating or non-eating with 93% accuracy (user dependent).","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82304242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Implicit Human-computer Interaction: Two Complementary Approaches 隐式人机交互:两种互补的方法
Julia Wache
{"title":"Implicit Human-computer Interaction: Two Complementary Approaches","authors":"Julia Wache","doi":"10.1145/2818346.2823311","DOIUrl":"https://doi.org/10.1145/2818346.2823311","url":null,"abstract":"One of the main goals in Human Computer Interaction (HCI) is improving the interface between users and computers: Interfacing should be intuitive, effortless and easy to learn. We approach the goal from two opposite but complementary directions: On the one hand, computer-user interaction can be enhanced if the computer can assess users differences in an automated manner. Therefore we collected physiological and psychological data from people exposed to emotional stimuli and created a database for the community to use for further research in the context of automated learning to detect the differences in the inner states of users. We employed the data both to not only predict the emotional state of users but also their personality traits. On the other hand, users need information dispatched by a computer to be easily, intuitively accessible. To minimize the cognitive effort of assimilating information we use a tactile device in form of a belt and test how it can be best used to replace or augment the information received from other senses (e.g., visual and auditory) in a navigation task. We investigate how both approaches can be combined to improve specific applications.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78006128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Classification of Children's Social Dominance in Group Interactions with Robots 儿童与机器人群体互动的社会优势分类
Sarah Strohkorb, Iolanda Leite, Natalie Warren, B. Scassellati
{"title":"Classification of Children's Social Dominance in Group Interactions with Robots","authors":"Sarah Strohkorb, Iolanda Leite, Natalie Warren, B. Scassellati","doi":"10.1145/2818346.2820735","DOIUrl":"https://doi.org/10.1145/2818346.2820735","url":null,"abstract":"As social robots become more widespread in educational environments, their ability to understand group dynamics and engage multiple children in social interactions is crucial. Social dominance is a highly influential factor in social interactions, expressed through both verbal and nonverbal behaviors. In this paper, we present a method for determining whether a participant is high or low in social dominance in a group interaction with children and robots. We investigated the correlation between many verbal and nonverbal behavioral features with social dominance levels collected through teacher surveys. We additionally implemented Logistic Regression and Support Vector Machines models with classification accuracies of 81% and 89%, respectively, showing that using a small subset of nonverbal behavioral features, these models can successfully classify children's social dominance level. Our approach for classifying social dominance is novel not only for its application to children, but also for achieving high classification accuracies using a reduced set of nonverbal features that, in future work, can be automatically extracted with current sensing technology.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"32 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91488971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Exploring Intent-driven Multimodal Interface for Geographical Information System 探索意图驱动的多模式地理信息系统接口
Feng Sun
{"title":"Exploring Intent-driven Multimodal Interface for Geographical Information System","authors":"Feng Sun","doi":"10.1145/2818346.2823304","DOIUrl":"https://doi.org/10.1145/2818346.2823304","url":null,"abstract":"Geographic Information Systems (GIS) offers a large amount of functions for performing spatial analysis and geospatial information retrieval. However, off-the-shelf GIS remains difficult to use for occasional GIS experts. The major problem lies in that its interface organizes spatial analysis tools and functions according to spatial data structures and corresponding algorithms, which is conceptually confusing and cognitively complex. Prior work identified the usability problem of conventional GIS interface and developed alternatives based on speech or gesture to narrow the gap between the high-functionality provided by GIS and its usability. This paper outlined my doctoral research goal in understanding human-GIS interaction activity, especially how interaction modalities assist to capture spatial analysis intention and influence collaborative spatial problem solving. We proposed a framework for enabling multimodal human-GIS interaction driven by intention. We also implemented a prototype GeoEASI (Geo-dialogue Environment for Assisted Spatial Inquiry) to demonstrate the effectiveness of our framework. GeoEASI understands commonly known spatial analysis intentions through multimodal techniques and is able to assist users to perform spatial analysis with proper strategies. Further work will evaluate the effectiveness of our framework, improve the reliability and flexibility of the system, extend the GIS interface for supporting multiple users, and integrate the system into GeoDeliberation. We will concentrate on how multimodality technology can be adopted in these circumstances and explore the potentials of it. The study aims to demonstrate the feasibility of building a GIS to be both useful and usable by introducing an intent-driven multimodal interface, forming the key to building a better theory of spatial thinking for GIS.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86350987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Detecting and Synthesizing Synchronous Joint Action in Human-Robot Teams 人-机器人团队同步关节动作的检测与综合
T. Iqbal, L. Riek
{"title":"Detecting and Synthesizing Synchronous Joint Action in Human-Robot Teams","authors":"T. Iqbal, L. Riek","doi":"10.1145/2818346.2823315","DOIUrl":"https://doi.org/10.1145/2818346.2823315","url":null,"abstract":"To become capable teammates to people, robots need the ability to interpret human activities and appropriately adjust their actions in real time. The goal of our research is to build robots that can work fluently and contingently with human teams. To this end, we have designed novel nonlinear dynamical methods to automatically model and detect synchronous joint action (SJA) in human teams. We also have extended this work to enable robots to move jointly with human teammates in real time. In this paper, we describe our work to date, and discuss our future research plans to further explore this research space. The results of this work are expected to benefit researchers in social signal processing, human-machine interaction, and robotics.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"272 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79645055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信