P. Cardinal, N. Dehak, Alessandro Lameiras Koerich, J. Alam, Patrice Boucher
{"title":"ETS System for AV+EC 2015 Challenge","authors":"P. Cardinal, N. Dehak, Alessandro Lameiras Koerich, J. Alam, Patrice Boucher","doi":"10.1145/2808196.2811639","DOIUrl":"https://doi.org/10.1145/2808196.2811639","url":null,"abstract":"This paper presents the system that we have developed for the AV+EC 2015 challenge which is mainly based on deep neural networks (DNNs). We have investigated different options using the audio feature set as a base system. The improvements that were achieved on this specific modality have been applied to other modalities. One of our main findings is that the frame stacking technique improves the quality of the predictions made by our model, and the improvements were also observed in all other modalities. Besides that, we also present a new feature set derived from the cardiac rhythm that were extracted from electrocardiogram readings. Such a new feature set helped us to improve the concordance correlation coefficient from 0.088 to 0.124 (on the development set) for the valence, an improvement of 25%. Finally, the fusion of all modalities has been studied using fusion at feature level using a DNN and at prediction level by training linear and random forest regressors. Both fusion schemes provided promising results.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127435826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Affective Analysis combining Regularized Linear Regression and Boosted Regression Trees","authors":"Aleksandar Milchevski, A. Rozza, D. Taskovski","doi":"10.1145/2808196.2811636","DOIUrl":"https://doi.org/10.1145/2808196.2811636","url":null,"abstract":"In this paper we present a multimodal approach for affective analysis that exploits features from video, audio, Electrocardiogram (ECG), and Electrodermal Activity (EDA) combining two regression techniques, namely Boosted Regression Trees and Linear Regression. Moreover, we propose a novel regularization approach for the Linear Regression in order to exploit the temporal correlation of the affective dimensions. The final prediction is obtained using a decision level fusion of the regressors individually trained on the different groups of features. The promising results obtained on the benchmark dataset show the efficacy and effectiveness of the proposed approach.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124599386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: AV+EC 2015 Part 2","authors":"F. Ringeval","doi":"10.1145/3247560","DOIUrl":"https://doi.org/10.1145/3247560","url":null,"abstract":"","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124962452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Keynote","authors":"F. Ringeval","doi":"10.1145/3247557","DOIUrl":"https://doi.org/10.1145/3247557","url":null,"abstract":"","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124243177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vocal Emotion Recognition with Log-Gabor Filters","authors":"Yu Gu, E. Postma, H. Lin","doi":"10.1145/2808196.2811635","DOIUrl":"https://doi.org/10.1145/2808196.2811635","url":null,"abstract":"Vocal emotion recognition aims to identify the emotional states of speakers by analyzing their speech signal. This paper builds on the work of Ezzat, Bouvrie and Poggio by performing a spectro-temporal analysis of affective vocalizations by decomposing the associated spectrogram with 2D Gabor filters. Based on the previous studies of the emotion expression in voices and the turn out display in spectrogram, we assumed that each vocal emotion has a unique spectro-temporal signature in terms of orientated energy bands which can be detected by properly tuned Gabor filters. We compared the emotion-recognition performances of tuned log-Gabor filters with standard acoustic features. The experimental results show that applying pairs of log-Gabor filters to extract features from the spectrogram yields a performance that matches the performance of an approach based on traditional acoustic features. Their combined emotion recognition performance outperforms state-of-the-art vocal emotion recognition algorithms. This leads us to conclude that tuned log-Gabor filters support the automatic recognition of emotions from speech and may be beneficial to other speech-related tasks.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114368414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markus Kächele, Patrick Thiam, G. Palm, F. Schwenker, Martin Schels
{"title":"Ensemble Methods for Continuous Affect Recognition: Multi-modality, Temporality, and Challenges","authors":"Markus Kächele, Patrick Thiam, G. Palm, F. Schwenker, Martin Schels","doi":"10.1145/2808196.2811637","DOIUrl":"https://doi.org/10.1145/2808196.2811637","url":null,"abstract":"In this paper we present a multi-modal system based on audio, video and bio-physiological features for continuous recognition of human affect in unconstrained scenarios. We leverage the robustness of ensemble classifiers as base learners and refine the predictions using stochastic gradient descent based optimization on the desired loss function. Furthermore we provide a discussion about pre- and post-processing steps that help to improve the robustness of the regression and subsequently the prediction quality.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126986675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen
{"title":"Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition","authors":"Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen","doi":"10.1145/2808196.2811634","DOIUrl":"https://doi.org/10.1145/2808196.2811634","url":null,"abstract":"This paper presents our effort to the Audio/Visual+ Emotion Challenge (AV+EC2015), whose goal is to predict the continuous values of the emotion dimensions arousal and valence from audio, visual and physiology modalities. The state of art classifier for dimensional recognition, long short term memory recurrent neural network (LSTM-RNN) is utilized. Except regular LSTM-RNN prediction architecture, two techniques are investigated for dimensional emotion recognition problem. The first one is ε -insensitive loss is utilized as the loss function to optimize. Compared to squared loss function, which is the most widely used loss function for dimension emotion recognition, ε -insensitive loss is more robust for the label noises and it can ignore small errors to get stronger correlation between predictions and labels. The other one is temporal pooling. This technique enables temporal modeling in the input features and increases the diversity of the features fed into the forward prediction architecture. Experiments results show the efficiency of key points of the proposed method and competitive results are obtained.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128184068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Ringeval, Björn Schuller, M. Valstar, S. Jaiswal, E. Marchi, D. Lalanne, R. Cowie, M. Pantic
{"title":"AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data","authors":"F. Ringeval, Björn Schuller, M. Valstar, S. Jaiswal, E. Marchi, D. Lalanne, R. Cowie, M. Pantic","doi":"10.1145/2808196.2811642","DOIUrl":"https://doi.org/10.1145/2808196.2811642","url":null,"abstract":"We present the first Audio-Visual+ Emotion recognition Challenge and workshop (AV+EC 2015) aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and physiological emotion analysis. This is the 5th event in the AVEC series, but the very first Challenge that bridges across audio, video and physiological data. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the audio, video and physiological emotion recognition communities, to compare the relative merits of the three approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. This paper presents the challenge, the dataset and the performance of the baseline system.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114923505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks","authors":"Shizhe Chen, Qin Jin","doi":"10.1145/2808196.2811638","DOIUrl":"https://doi.org/10.1145/2808196.2811638","url":null,"abstract":"Emotion recognition has been an active research area with both wide applications and big challenges. This paper presents our effort for the Audio/Visual Emotion Challenge (AVEC2015), whose goal is to explore utilizing audio, visual and physiological signals to continuously predict the value of the emotion dimensions (arousal and valence). Our system applies the Recurrent Neural Networks (RNN) to model temporal information. We explore various aspects to improve the prediction performance including: the dominant modalities for arousal and valence prediction, duration of features, novel loss functions, directions of Long Short Term Memory (LSTM), multi-task learning, different structures for early feature fusion and late fusion. Best settings are chosen according to the performance on the development set. Competitive experimental results compared with the baseline show the effectiveness of the proposed methods.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126989036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the Importance of Individual Differences to the Automatic Estimation of Emotions Induced by Music","authors":"Hesam Sagha, E. Coutinho, Björn Schuller","doi":"10.1145/2808196.2811643","DOIUrl":"https://doi.org/10.1145/2808196.2811643","url":null,"abstract":"The goal of this study was to evaluate the impact of the inclusion of listener-related factors (individual differences) on the prediction of music induced affect. A group of 24 subjects listened to a set of music excerpts previously demonstrated to express specific emotional characteristics (in terms of Arousal and Valence), and we collected information related to listeners' stable (personality, emotional intelligence, attentiveness, music preferences) and transient (mood, and physiological activity) states. Through a series of regression analysis we identified those factors which have a significant explanatory power over the affective states induced in the listeners. Our results show that incorporating information related to individual differences permits to identify more accurately the affective states induced in the listeners, which differ from those expressed by the music.","PeriodicalId":123597,"journal":{"name":"Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115929061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}