Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction最新文献

Assessment of users' interests in multimodal dialog based on exchange unit 基于交换单元的多模态对话用户兴趣评估

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263.3011269

Sayaka Tomimasu, Masahiro Araki

引用次数: 6

Deictic gestures in coaching interactions 指导互动中的指示手势

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263.3011267

I. D. Kok, J. Hough, David Schlangen, S. Kopp

引用次数: 2

Increasing robustness of multimodal interaction via individual interaction histories 通过个体交互历史增加多模态交互的鲁棒性

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263.3011273

Felix Schüssel, F. Honold, N. Bubalo, M. Weber

{"title":"Increasing robustness of multimodal interaction via individual interaction histories","authors":"Felix Schüssel, F. Honold, N. Bubalo, M. Weber","doi":"10.1145/3011263.3011273","DOIUrl":"https://doi.org/10.1145/3011263.3011273","url":null,"abstract":"Multimodal input fusion can be considered a well researched topic and yet it is rarely found in real world applications. One reason for this could be the lack of robustness in real world situations, especially regarding unimodal recognition technologies like speech and gesture, that tend to produce erroneous inputs that can not be detected by the subsequent multimodal input fusion mechanism. Previous work implying the possibility to detect and overcome such errors through knowledge of individual temporal behaviors has neither provided a real-time implementation nor evaluated the real benefit of such an approach. We present such an implementation of applying individual interaction histories in order to increase the robustness of multimodal inputs within a smartwatch scenario. We show how such knowledge can be created and maintained at runtime, present evaluation data from an experiment conducted in a realistic scenario, and compare the approach to the state of the art known from literature. Our approach is ready to use in other applications and existing systems, with the prospect to increase the overall robustness of future multimodal systems.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128419500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Attitude recognition of video bloggers using audio-visual descriptors 使用视听描述符的视频博主态度识别

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263.3011270

F. Haider, L. Cerrato, S. Luz, N. Campbell

{"title":"Attitude recognition of video bloggers using audio-visual descriptors","authors":"F. Haider, L. Cerrato, S. Luz, N. Campbell","doi":"10.1145/3011263.3011270","DOIUrl":"https://doi.org/10.1145/3011263.3011270","url":null,"abstract":"In social media, vlogs (video blogs) are a form of unidirectional communication, where the vloggers (video bloggers) convey their messages (opinions, thoughts, etc.) to a potential audience which cannot give them feedback in real time. In this kind of communication, the non-verbal behaviour and personality impression of a video blogger tends to influence viewers' attention because non-verbal cues are correlated with the messages conveyed by a vlogger. In this study, we use the acoustic and visual features (body movements that are captured by low-level visual descriptors) to predict the six different attitudes (amusement, enthusiasm, friendliness, frustration, impatience and neutral) annotated in the speech of 10 video bloggers. The automatic detection of attitude can be helpful in a scenario where a machine has to automatically provide feedback to bloggers about their performance in terms of the extent to which they manage to engage the audience by displaying certain attitudes. Attitude recognition models are trained using the random forest classifier. Results show that: 1) acoustic features provide better accuracy than the visual features, 2) while fusion of audio and visual features does not increase overall accuracy, it improves the results for some attitudes and subjects, and 3) densely extracted histograms of flow provide better results than other visual descriptors. A three-class (positive, negative and neutral attitudes) problem has also been defined. Results for this setting show that feature fusion degrades overall classifier accuracy, and the classifiers perform better on the original six-class problem than on the three-class setting.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122902222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Analysis of gesture frequency and amplitude as a function of personality in virtual agents 虚拟代理中手势频率和幅度作为个性函数的分析

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263.3011266

Alex Rayón, Timothy Gonzalez, D. Novick

引用次数: 3

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction 人机交互中启用人工智能的多模态分析研讨会论文集

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263

Ronald Böck, Francesca Bonin, N. Campbell, R. Poppe

引用次数: 0

Annotation and analysis of listener's engagement based on multi-modal behaviors 基于多模态行为的听者参与注释与分析

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263.3011271

K. Inoue, Divesh Lala, Shizuka Nakamura, K. Takanashi, Tatsuya Kawahara

引用次数: 10

Automatic annotation of gestural units in spontaneous face-to-face interaction 自发面对面互动中手势单位的自动标注

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263.3011268

Simon Alexanderson, D. House, J. Beskow

{"title":"Automatic annotation of gestural units in spontaneous face-to-face interaction","authors":"Simon Alexanderson, D. House, J. Beskow","doi":"10.1145/3011263.3011268","DOIUrl":"https://doi.org/10.1145/3011263.3011268","url":null,"abstract":"Speech and gesture co-occur in spontaneous dialogue in a highly complex fashion. There is a large variability in the motion that people exhibit during a dialogue, and different kinds of motion occur during different states of the interaction. A wide range of multimodal interface applications, for example in the fields of virtual agents or social robots, can be envisioned where it is important to be able to automatically identify gestures that carry information and discriminate them from other types of motion. While it is easy for a human to distinguish and segment manual gestures from a flow of multimodal information, the same task is not trivial to perform for a machine. In this paper we present a method to automatically segment and label gestural units from a stream of 3D motion capture data. The gestural flow is modeled with a 2-level Hierarchical Hidden Markov Model (HHMM) where the sub-states correspond to gesture phases. The model is trained based on labels of complete gesture units and self-adaptive manipulators. The model is tested and validated on two datasets differing in genre and in method of capturing motion, and outperforms a state-of-the-art SVM classifier on a publicly available dataset.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114748909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Fitmirror: a smart mirror for positive affect in everyday user morning routines Fitmirror:一款智能镜子，为用户的日常生活带来积极影响

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263.3011265

Daniel Besserer, Johannes Bäurle, Alexander Nikic, F. Honold, Felix Schüssel, M. Weber

{"title":"Fitmirror: a smart mirror for positive affect in everyday user morning routines","authors":"Daniel Besserer, Johannes Bäurle, Alexander Nikic, F. Honold, Felix Schüssel, M. Weber","doi":"10.1145/3011263.3011265","DOIUrl":"https://doi.org/10.1145/3011263.3011265","url":null,"abstract":"This paper will discuss the concept of a smart mirror for healthier living, the FitMirror. Many people have serious problems to get up after sleeping, to get motivated for the day, or are tired and in a bad mood in the morning. The goal of FitMirror is to positively affect the user's feelings by increasing his/her motivation, mood and feeling of fitness. While concepts for these isolated problems exist, none of these combine them into one system. FitMirror is implemented to combine them and evaluate them in a study. It consists of a monitor with spy-foil, a Microsoft Kinect v2 and a Wii Balance Board and can recognize users and their gestures with these elements. Several hypotheses about the system regarding motivation, fun, difficulty and getting awake were investigated. Participants were grouped by the factors sportspersons and morning persons to investigate the effect based on these aspects. Results show that FitMirror can help users get awake in the morning, raise their motivation to do sports and motivate them for the day.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128431675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Body movements and laughter recognition: experiments in first encounter dialogues 肢体动作与笑声识别:初次相遇对话的实验

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction Pub Date : 2016-11-12 DOI: 10.1145/3011263.3011264

Kristiina Jokinen, Trung Ngo Trong, G. Wilcock

{"title":"Body movements and laughter recognition: experiments in first encounter dialogues","authors":"Kristiina Jokinen, Trung Ngo Trong, G. Wilcock","doi":"10.1145/3011263.3011264","DOIUrl":"https://doi.org/10.1145/3011263.3011264","url":null,"abstract":"This paper reports work on automatic analysis of laughter and human body movements in a video corpus of human-human dialogues. We use the Nordic First Encounters video corpus where participants meet each other for the first time. This corpus has manual annotations of participants' head, hand and body movements as well as laughter occurrences. We employ machine learning methods to analyse the corpus using two types of features: visual features that describe bounding boxes around participants' heads and bodies, automatically detecting body movements in the video, and audio speech features based on the participants' spoken contributions. We then correlate the speech and video features and apply neural network techniques to predict if a person is laughing or not given a sequence of video features. The hypothesis is that laughter occurrences and body movement are synchronized, or at least there is a significant relation between laughter activities and occurrences of body movements. Our results confirm the hypothesis of the synchrony of body movements with laughter, but we also emphasise the complexity of the problem and the need for further investigations on the feature sets and the algorithm used.","PeriodicalId":272696,"journal":{"name":"Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127932950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7