Proceedings of the 2015 ACM on International Conference on Multimodal Interaction最新文献

筛选
英文 中文
2015 Multimodal Learning and Analytics Grand Challenge 2015年多模式学习与分析大挑战
M. Worsley, K. Chiluiza, Joseph F. Grafsgaard, X. Ochoa
{"title":"2015 Multimodal Learning and Analytics Grand Challenge","authors":"M. Worsley, K. Chiluiza, Joseph F. Grafsgaard, X. Ochoa","doi":"10.1145/2818346.2829995","DOIUrl":"https://doi.org/10.1145/2818346.2829995","url":null,"abstract":"Multimodality is an integral part of teaching and learning. Over the past few decades researchers have been designing, creating and analyzing novel environments that enable students to experience and demonstrate learning through a variety of modalities. The recent availability of low cost multimodal sensors, advances in artificial intelligence and improved techniques for large scale data analysis have enabled researchers and practitioners to push the boundaries on multimodal learning and multimodal learning analytics. In an effort to continue these developments, the 2015 Multimodal Learning and Analytics Grand Challenge includes a combined focus on new techniques to capture multimodal learning data, as well as the development of rich, multimodal learning applications.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87776383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Gestimator: Shape and Stroke Similarity Based Gesture Recognition 基于形状和笔画相似性的手势识别
Yina Ye, P. Nurmi
{"title":"Gestimator: Shape and Stroke Similarity Based Gesture Recognition","authors":"Yina Ye, P. Nurmi","doi":"10.1145/2818346.2820734","DOIUrl":"https://doi.org/10.1145/2818346.2820734","url":null,"abstract":"Template-based approaches are currently the most popular gesture recognition solution for interactive systems as they provide accurate and runtime efficient performance in a wide range of applications. The basic idea in these approaches is to measure similarity between a user gesture and a set of pre-recorded templates, and to determine the appropriate gesture type using a nearest neighbor classifier. While simple and elegant, this approach performs well only when the gestures are relatively simple and unambiguous. In increasingly many scenarios, such as authentication, interactive learning, and health care applications, the gestures of interest are complex, consist of multiple sub-strokes, and closely resemble other gestures. Merely considering the shape of the gesture is not sufficient for these scenarios, and robust identification of the constituent sequence of sub-strokes is also required. The present paper contributes by introducing Gestimator, a novel gesture recognizer that combines shape and stroke-based similarity into a sequential classification framework for robust gesture recognition. Experiments carried out using three datasets demonstrate significant performance gains compared to current state-of-the-art techniques. The performance improvements are highest for complex gestures, but consistent improvements are achieved even for simple and widely studied gesture types.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86431193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Evaluating Speech, Face, Emotion and Body Movement Time-series Features for Automated Multimodal Presentation Scoring 评价语音,面部,情绪和身体运动时间序列特征的自动多模态演示评分
Vikram Ramanarayanan, C. W. Leong, L. Chen, G. Feng, David Suendermann-Oeft
{"title":"Evaluating Speech, Face, Emotion and Body Movement Time-series Features for Automated Multimodal Presentation Scoring","authors":"Vikram Ramanarayanan, C. W. Leong, L. Chen, G. Feng, David Suendermann-Oeft","doi":"10.1145/2818346.2820765","DOIUrl":"https://doi.org/10.1145/2818346.2820765","url":null,"abstract":"We analyze how fusing features obtained from different multimodal data streams such as speech, face, body movement and emotion tracks can be applied to the scoring of multimodal presentations. We compute both time-aggregated and time-series based features from these data streams--the former being statistical functionals and other cumulative features computed over the entire time series, while the latter, dubbed histograms of cooccurrences, capture how different prototypical body posture or facial configurations co-occur within different time-lags of each other over the evolution of the multimodal, multivariate time series. We examine the relative utility of these features, along with curated speech stream features in predicting human-rated scores of multiple aspects of presentation proficiency. We find that different modalities are useful in predicting different aspects, even outperforming a naive human inter-rater agreement baseline for a subset of the aspects analyzed.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83886429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Look & Pedal: Hands-free Navigation in Zoomable Information Spaces through Gaze-supported Foot Input 看&脚踏:免提导航在可缩放的信息空间,通过目光支持的脚输入
Konstantin Klamka, A. Siegel, Stefan Vogt, F. Göbel, S. Stellmach, Raimund Dachselt
{"title":"Look & Pedal: Hands-free Navigation in Zoomable Information Spaces through Gaze-supported Foot Input","authors":"Konstantin Klamka, A. Siegel, Stefan Vogt, F. Göbel, S. Stellmach, Raimund Dachselt","doi":"10.1145/2818346.2820751","DOIUrl":"https://doi.org/10.1145/2818346.2820751","url":null,"abstract":"For a desktop computer, we investigate how to enhance conventional mouse and keyboard interaction by combining the input modalities gaze and foot. This multimodal approach offers the potential for fluently performing both manual input (e.g., for precise object selection) and gaze-supported foot input (for pan and zoom) in zoomable information spaces in quick succession or even in parallel. For this, we take advantage of fast gaze input to implicitly indicate where to navigate to and additional explicit foot input for speed control while leaving the hands free for further manual input. This allows for taking advantage of gaze input in a subtle and unobtrusive way. We have carefully elaborated and investigated three variants of foot controls incorporating one-, two- and multidirectional foot pedals in combination with gaze. These were evaluated and compared to mouse-only input in a user study using Google Earth as a geographic information system. The results suggest that gaze-supported foot input is feasible for convenient, user-friendly navigation and comparable to mouse input and encourage further investigations of gaze-supported foot controls.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79590685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Sharing Representations for Long Tail Computer Vision Problems 长尾计算机视觉问题的共享表示
Samy Bengio
{"title":"Sharing Representations for Long Tail Computer Vision Problems","authors":"Samy Bengio","doi":"10.1145/2818346.2818348","DOIUrl":"https://doi.org/10.1145/2818346.2818348","url":null,"abstract":"The long tail phenomena appears when a small number of objects/words/classes are very frequent and thus easy to model, while many many more are rare and thus hard to model. This has always been a problem in machine learning. We start by explaining why representation sharing in general, and embedding approaches in particular, can help to represent tail objects. Several embedding approaches are presented, in increasing levels of complexity, to show how to tackle the long tail problem, from rare classes to unseen classes in image classification (the so-called zero-shot setting). Finally, we present our latest results on image captioning, which can be seen as an ultimate rare class problem since each image is attributed to a novel, yet structured, class in the form of a meaningful descriptive sentence.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"344 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79609075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Touch Challenge '15: Recognizing Social Touch Gestures 触摸挑战’15:识别社交触摸手势
Merel M. Jung, Laura Cang, M. Poel, Karon E Maclean
{"title":"Touch Challenge '15: Recognizing Social Touch Gestures","authors":"Merel M. Jung, Laura Cang, M. Poel, Karon E Maclean","doi":"10.1145/2818346.2829993","DOIUrl":"https://doi.org/10.1145/2818346.2829993","url":null,"abstract":"Advances in the field of touch recognition could open up applications for touch-based interaction in areas such as Human-Robot Interaction (HRI). We extended this challenge to the research community working on multimodal interaction with the goal of sparking interest in the touch modality and to promote exploration of the use of data processing techniques from other more mature modalities for touch recognition. Two data sets were made available containing labeled pressure sensor data of social touch gestures that were performed by touching a touch-sensitive surface with the hand. Each set was collected from similar sensor grids, but under conditions reflecting different application orientations: CoST: Corpus of Social Touch and HAART: The Human-Animal Affective Robot Touch gesture set. In this paper we describe the challenge protocol and summarize the results from the touch challenge hosted in conjunction with the 2015 ACM International Conference on Multimodal Interaction (ICMI). The most important outcomes of the challenges were: (1) transferring techniques from other modalities, such as image processing, speech, and human action recognition provided valuable feature sets; (2) gesture classification confusions were similar despite the various data processing methods used.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"31 10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89972224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Public Speaking Training with a Multimodal Interactive Virtual Audience Framework 基于多模态互动虚拟听众框架的公开演讲训练
Mathieu Chollet, Kalin Stefanov, H. Prendinger, Stefan Scherer
{"title":"Public Speaking Training with a Multimodal Interactive Virtual Audience Framework","authors":"Mathieu Chollet, Kalin Stefanov, H. Prendinger, Stefan Scherer","doi":"10.1145/2818346.2823294","DOIUrl":"https://doi.org/10.1145/2818346.2823294","url":null,"abstract":"We have developed an interactive virtual audience platform for public speaking training. Users' public speaking behavior is automatically analyzed using multimodal sensors, and ultimodal feedback is produced by virtual characters and generic visual widgets depending on the user's behavior. The flexibility of our system allows to compare different interaction mediums (e.g. virtual reality vs normal interaction), social situations (e.g. one-on-one meetings vs large audiences) and trained behaviors (e.g. general public speaking performance vs specific behaviors).","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91200726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Towards Attentive, Bi-directional MOOC Learning on Mobile Devices 面向移动设备的专注、双向MOOC学习
Xiang Xiao, Jingtao Wang
{"title":"Towards Attentive, Bi-directional MOOC Learning on Mobile Devices","authors":"Xiang Xiao, Jingtao Wang","doi":"10.1145/2818346.2820754","DOIUrl":"https://doi.org/10.1145/2818346.2820754","url":null,"abstract":"AttentiveLearner is a mobile learning system optimized for consuming lecture videos in Massive Open Online Courses (MOOCs) and flipped classrooms. AttentiveLearner converts the built-in camera of mobile devices into both a tangible video control channel and an implicit heart rate sensing channel by analyzing the learner's fingertip transparency changes in real time. In this paper, we report disciplined research efforts in making AttentiveLearner truly practical in real-world use. Through two 18-participant user studies and follow-up analyses, we found that 1) the tangible video control interface is intuitive to use and efficient to operate; 2) heart rate signals implicitly captured by AttentiveLearner can be used to infer both the learner's interests and perceived confusion levels towards the corresponding learning topics; 3) AttentiveLearner can achieve significantly higher accuracy by predicting extreme personal learning events and aggregated learning events.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89584453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Behavioral and Emotional Spoken Cues Related to Mental States in Human-Robot Social Interaction 人-机器人社会互动中与心理状态相关的行为和情感言语线索
Lucile Bechade, G. D. Duplessis, M. A. Sehili, L. Devillers
{"title":"Behavioral and Emotional Spoken Cues Related to Mental States in Human-Robot Social Interaction","authors":"Lucile Bechade, G. D. Duplessis, M. A. Sehili, L. Devillers","doi":"10.1145/2818346.2820777","DOIUrl":"https://doi.org/10.1145/2818346.2820777","url":null,"abstract":"Understanding human behavioral and emotional cues occurring in interaction has become a major research interest due to the emergence of numerous applications such as in social robotics. While there is agreement across different theories that some behavioral signals are involved in communicating information, there is a lack of consensus regarding their specificity, their universality, and whether they convey emotions, affective, cognitive, mental states or all of those. Our goal in this study is to explore the relationship between behavioral and emotional cues extracted from speech (e.g., laughter, speech duration, negative emotions) with different communicative information about the human participant. This study is based on a corpus of audio/video data of humorous interactions between the nao{} robot and 37 human participants. Participants filled three questionnaires about their personality, sense of humor and mental states regarding the interaction. This work reveals the existence of many links between behavioral and emotional cues and the mental states reported by human participants through self-report questionnaires. However, we have not found a clear connection between reported mental states and participants profiles.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76776505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Multimodal Capture of Teacher-Student Interactions for Automated Dialogic Analysis in Live Classrooms 实时课堂中自动化对话分析的师生互动的多模态捕获
S. D’Mello, A. Olney, Nathaniel Blanchard, Borhan Samei, Xiaoyi Sun, Brooke Ward, Sean Kelly
{"title":"Multimodal Capture of Teacher-Student Interactions for Automated Dialogic Analysis in Live Classrooms","authors":"S. D’Mello, A. Olney, Nathaniel Blanchard, Borhan Samei, Xiaoyi Sun, Brooke Ward, Sean Kelly","doi":"10.1145/2818346.2830602","DOIUrl":"https://doi.org/10.1145/2818346.2830602","url":null,"abstract":"We focus on data collection designs for the automated analysis of teacher-student interactions in live classrooms with the goal of identifying instructional activities (e.g., lecturing, discussion) and assessing the quality of dialogic instruction (e.g., analysis of questions). Our designs were motivated by multiple technical requirements and constraints. Most importantly, teachers could be individually micfied but their audio needed to be of excellent quality for automatic speech recognition (ASR) and spoken utterance segmentation. Individual students could not be micfied but classroom audio quality only needed to be sufficient to detect student spoken utterances. Visual information could only be recorded if students could not be identified. Design 1 used an omnidirectional laptop microphone to record both teacher and classroom audio and was quickly deemed unsuitable. In Designs 2 and 3, teachers wore a wireless Samson AirLine 77 vocal headset system, which is a unidirectional microphone with a cardioid pickup pattern. In Design 2, classroom audio was recorded with dual first- generation Microsoft Kinects placed at the front corners of the class. Design 3 used a Crown PZM-30D pressure zone microphone mounted on the blackboard to record classroom audio. Designs 2 and 3 were tested by recording audio in 38 live middle school classrooms from six U.S. schools while trained human coders simultaneously performed live coding of classroom discourse. Qualitative and quantitative analyses revealed that Design 3 was suitable for three of our core tasks: (1) ASR on teacher speech (word recognition rate of 66% and word overlap rate of 69% using Google Speech ASR engine); (2) teacher utterance segmentation (F-measure of 97%); and (3) student utterance segmentation (F-measure of 66%). Ideas to incorporate video and skeletal tracking with dual second-generation Kinects to produce Design 4 are discussed.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72882724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信