{"title":"Retrieving Target Gestures Toward Speech Driven Animation with Meaningful Behaviors","authors":"Najmeh Sadoughi, C. Busso","doi":"10.1145/2818346.2820750","DOIUrl":"https://doi.org/10.1145/2818346.2820750","url":null,"abstract":"Creating believable behaviors for conversational agents (CAs) is a challenging task, given the complex relationship between speech and various nonverbal behaviors. The two main approaches are rule-based systems, which tend to produce behaviors with limited variations compared to natural interactions, and data-driven systems, which tend to ignore the underlying semantic meaning of the message (e.g., gestures without meaning). We envision a hybrid system, acting as the behavior realization layer in rule-based systems, while exploiting the rich variation in natural interactions. Constrained on a given target gesture (e.g., head nod) and speech signal, the system will generate novel realizations learned from the data, capturing the timely relationship between speech and gestures. An important task in this research is identifying multiple examples of the target gestures in the corpus. This paper proposes a data mining framework for detecting gestures of interest in a motion capture database. First, we train One-class support vector machines (SVMs) to detect candidate segments conveying the target gesture. Second, we use dynamic time alignment kernel (DTAK) to compare the similarity between the examples (i.e., target gesture) and the given segments. We evaluate the approach for five prototypical hand and head gestures showing reasonable performance. These retrieved gestures are then used to train a speech-driven framework based on dynamic Bayesian networks (DBNs) to synthesize these target behaviors.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73517836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Sun, Liandong Li, Guoyan Zhou, Xuewen Wu, Jun He, Lejun Yu, Dongxue Li, Qinglan Wei
{"title":"Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild","authors":"Bo Sun, Liandong Li, Guoyan Zhou, Xuewen Wu, Jun He, Lejun Yu, Dongxue Li, Qinglan Wei","doi":"10.1145/2818346.2830586","DOIUrl":"https://doi.org/10.1145/2818346.2830586","url":null,"abstract":"In this paper, we describe our work in the third Emotion Recognition in the Wild (EmotiW 2015) Challenge. For each video clip, we extract MSDF, LBP-TOP, HOG, LPQ-TOP and acoustic features to recognize the emotions of film characters. For the static facial expression recognition based on video frame, we extract MSDF, DCNN and RCNN features. We train linear SVM classifiers for these kinds of features on the AFEW and SFEW dataset, and we propose a novel fusion network to combine all the extracted features at decision level. The final achievement we gained is 51.02% on the AFEW testing set and 51.08% on the SFEW testing set, which are much better than the baseline recognition rate of 39.33% and 39.13%.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"287 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75425783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Zhang, E. Coutinho, Zixing Zhang, C. Quan, Björn Schuller
{"title":"Dynamic Active Learning Based on Agreement and Applied to Emotion Recognition in Spoken Interactions","authors":"Yue Zhang, E. Coutinho, Zixing Zhang, C. Quan, Björn Schuller","doi":"10.1145/2818346.2820774","DOIUrl":"https://doi.org/10.1145/2818346.2820774","url":null,"abstract":"In this contribution, we propose a novel method for Active Learning (AL) - Dynamic Active Learning (DAL) - which targets the reduction of the costly human labelling work necessary for modelling subjective tasks such as emotion recognition in spoken interactions. The method implements an adaptive query strategy that minimises the amount of human labelling work by deciding for each instance whether it should automatically be labelled by machine or manually by human, as well as how many human annotators are required. Extensive experiments on standardised test-beds show that DAL significantly improves the efficiency of conventional AL. In particular, DAL achieves the same classification accuracy obtained with AL with up to 79.17% less human annotation effort.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77852769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ilhan Aslan, Thomas Meneweger, Verena Fuchsberger, M. Tscheligi
{"title":"Sharing Touch Interfaces: Proximity-Sensitive Touch Targets for Tablet-Mediated Collaboration","authors":"Ilhan Aslan, Thomas Meneweger, Verena Fuchsberger, M. Tscheligi","doi":"10.1145/2818346.2820740","DOIUrl":"https://doi.org/10.1145/2818346.2820740","url":null,"abstract":"During conversational practices, such as a tablet-mediated sales conversation between a salesperson and a customer, tablets are often used by two users who prefer specific bodily formations in order to easily face each other and the surface of the touchscreen. In a series of studies, we investigated bodily formations that are preferred during tablet-mediated sales conversations, and explored the effect of these formations on performance in acquiring touch targets (e.g., buttons) on a tablet device. We found that bodily formations cause decreased viewing angles to the shared screen, which results in a decreased performance in target acquisition. In order to address this issue, a multi-modal design consideration is presented, which combines mid-air finger movement and touch into a unified input modality, allowing the design of proximity sensitive touch targets. We conclude that the proposed embodied interaction design not only has potential to improve targeting performance, but also adapts the ``agency' of touch targets for multi-user settings.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85040437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CuddleBits: Friendly, Low-cost Furballs that Respond to Touch","authors":"Laura Cang, Paul Bucci, Karon E Maclean","doi":"10.1145/2818346.2823293","DOIUrl":"https://doi.org/10.1145/2818346.2823293","url":null,"abstract":"We present a real-time touch gesture recognition system using a low-cost fabric pressure sensor mounted on a small zoomorphic robot, affectionately called the `CuddleBit'. We explore the relationship between gesture recognition and affect through the lens of human-robot interaction. We demonstrate our real-time gesture recognition system, including both software and hardware, and a haptic display that brings the CuddleBit to life.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"429 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83589054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session 3: Language, Speech and Dialog","authors":"J. Lehman","doi":"10.1145/3252448","DOIUrl":"https://doi.org/10.1145/3252448","url":null,"abstract":"","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85435502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Salim, F. Haider, Owen Conlan, S. Luz, N. Campbell
{"title":"Analyzing Multimodality of Video for User Engagement Assessment","authors":"F. Salim, F. Haider, Owen Conlan, S. Luz, N. Campbell","doi":"10.1145/2818346.2820775","DOIUrl":"https://doi.org/10.1145/2818346.2820775","url":null,"abstract":"These days, several hours of new video content is uploaded to the internet every second. It is simply impossible for anyone to see every piece of video which could be engaging or even useful to them. Therefore it is desirable to identify videos that might be regarded as engaging automatically, for a variety of applications such as recommendation and personalized video segmentation etc. This paper explores how multimodal characteristics of video, such as prosodic, visual and paralinguistic features, can help in assessing user engagement with videos. The approach proposed in this paper achieved good accuracy (maximum F score of 96.93 %) through a novel combination of features extracted directly from video recordings, demonstrating the potential of this method in identifying engaging content.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78485996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Application of Word Processor UI paradigms to Audio and Animation Editing","authors":"A. D. Milota","doi":"10.1145/2818346.2823292","DOIUrl":"https://doi.org/10.1145/2818346.2823292","url":null,"abstract":"This demonstration showcases Quixotic, an audio editor, and Quintessence, an animation editor. Both appropriate many of the interaction techniques found in word processors, and allow users to more quickly create time-variant media. Our different approach to the interface aims to make recorded speech and simple animation into media that can be efficiently used for one-to-one asynchronous communications, quick note taking and documentation, as well as for idea refinement.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"160 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80101536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Wörtwein, Mathieu Chollet, Boris Schauerte, Louis-Philippe Morency, R. Stiefelhagen, Stefan Scherer
{"title":"Multimodal Public Speaking Performance Assessment","authors":"T. Wörtwein, Mathieu Chollet, Boris Schauerte, Louis-Philippe Morency, R. Stiefelhagen, Stefan Scherer","doi":"10.1145/2818346.2820762","DOIUrl":"https://doi.org/10.1145/2818346.2820762","url":null,"abstract":"The ability to speak proficiently in public is essential for many professions and in everyday life. Public speaking skills are difficult to master and require extensive training. Recent developments in technology enable new approaches for public speaking training that allow users to practice in engaging and interactive environments. Here, we focus on the automatic assessment of nonverbal behavior and multimodal modeling of public speaking behavior. We automatically identify audiovisual nonverbal behaviors that are correlated to expert judges' opinions of key performance aspects. These automatic assessments enable a virtual audience to provide feedback that is essential for training during a public speaking performance. We utilize multimodal ensemble tree learners to automatically approximate expert judges' evaluations to provide post-hoc performance assessments to the speakers. Our automatic performance evaluation is highly correlated with the experts' opinions with r = 0.745 for the overall performance assessments. We compare multimodal approaches with single modalities and find that the multimodal ensembles consistently outperform single modalities.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80239130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AttentiveLearner: Adaptive Mobile MOOC Learning via Implicit Cognitive States Inference","authors":"Xiang Xiao, Phuong Pham, Jingtao Wang","doi":"10.1145/2818346.2823297","DOIUrl":"https://doi.org/10.1145/2818346.2823297","url":null,"abstract":"This demo presents AttentiveLearner, a mobile learning system optimized for consuming lecture videos in Massive Open Online Courses (MOOCs) and flipped classrooms. AttentiveLearner uses on-lens finger gestures for video control and captures learners' physiological states through implicit heart rate tracking on unmodified mobile phones. Through three user studies to date, we found AttentiveLearner easy to learn, and intuitive to use. The heart beat waveforms captured by AttentiveLearner can be used to infer learners' cognitive states and attention. AttentiveLearner may serve as a promising supplemental feedback channel orthogonal to today's learning analytics technologies.","PeriodicalId":20486,"journal":{"name":"Proceedings of the 2015 ACM on International Conference on Multimodal Interaction","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80329291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}