AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661811
Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen
{"title":"Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video","authors":"Linlin Chao, J. Tao, Minghao Yang, Ya Li, Zhengqi Wen","doi":"10.1145/2661806.2661811","DOIUrl":"https://doi.org/10.1145/2661806.2661811","url":null,"abstract":"Understanding nonverbal behaviors in human machine interaction is a complex and challenge task. One of the key aspects is to recognize human emotion states accurately. This paper presents our effort to the Audio/Visual Emotion Challenge (AVEC'14), whose goal is to predict the continuous values of the emotion dimensions arousal, valence and dominance at each moment in time. The proposed method utilizes deep belief network based models to recognize emotion states from audio and visual modalities. Firstly, we employ temporal pooling functions in the deep neutral network to encode dynamic information in the features, which achieves the first time scale temporal modeling. Secondly, we combine the predicted results from different modalities and emotion temporal context information simultaneously. The proposed multimodal-temporal fusion achieves temporal modeling for the emotion states in the second time scale. Experiments results show the efficiency of each key point of the proposed method and competitive results are obtained","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128073090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661812
Asim Jan, H. Meng, Y. F. A. Gaus, Fan Zhang, Saeed Turabzadeh
{"title":"Automatic Depression Scale Prediction using Facial Expression Dynamics and Regression","authors":"Asim Jan, H. Meng, Y. F. A. Gaus, Fan Zhang, Saeed Turabzadeh","doi":"10.1145/2661806.2661812","DOIUrl":"https://doi.org/10.1145/2661806.2661812","url":null,"abstract":"Depression is a state of low mood and aversion to activity that can affect a person's thoughts, behavior, feelings and sense of well-being. In such a low mood, both the facial expression and voice appear different from the ones in normal states. In this paper, an automatic system is proposed to predict the scales of Beck Depression Inventory from naturalistic facial expression of the patients with depression. Firstly, features are extracted from corresponding video and audio signals to represent characteristics of facial and vocal expression under depression. Secondly, dynamic features generation method is proposed in the extracted video feature space based on the idea of Motion History Histogram (MHH) for 2-D video motion extraction. Thirdly, Partial Least Squares (PLS) and Linear regression are applied to learn the relationship between the dynamic features and depression scales using training data, and then to predict the depression scale for unseen ones. Finally, decision level fusion was done for combining predictions from both video and audio modalities. The proposed approach is evaluated on the AVEC2014 dataset and the experimental results demonstrate its effectiveness.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116909684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661810
Rahul Gupta, Nikos Malandrakis, Bo Xiao, T. Guha, Maarten Van Segbroeck, M. Black, A. Potamianos, Shrikanth S. Narayanan
{"title":"Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions","authors":"Rahul Gupta, Nikos Malandrakis, Bo Xiao, T. Guha, Maarten Van Segbroeck, M. Black, A. Potamianos, Shrikanth S. Narayanan","doi":"10.1145/2661806.2661810","DOIUrl":"https://doi.org/10.1145/2661806.2661810","url":null,"abstract":"Depression is one of the most common mood disorders. Technology has the potential to assist in screening and treating people with depression by robustly modeling and tracking the complex behavioral cues associated with the disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition is another challenge which stands to benefit from modeling such cues. The Audio/Visual Emotion Challenge (AVEC) aims toward understanding the two phenomena and modeling their correlation with observable cues across several modalities. In this paper, we use multimodal signal processing methodologies to address the two problems using data from human-computer interactions. We develop separate systems for predicting depression levels and affective dimensions, experimenting with several methods for combining the multimodal information. The proposed depression prediction system uses a feature selection approach based on audio, visual, and linguistic cues to predict depression scores for each session. Similarly, we use multiple systems trained on audio and visual cues to predict the affective dimensions in continuous-time. Our affect recognition system accounts for context during the frame-wise inference and performs a linear fusion of outcomes from the audio-visual systems. For both problems, our proposed systems outperform the video-feature based baseline systems. As part of this work, we analyze the role played by each modality in predicting the target variable and provide analytical insights.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114673047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661808
Ailbhe Cullen, Andrew Hines, N. Harte
{"title":"Building a Database of Political Speech: Does Culture Matter in Charisma Annotations?","authors":"Ailbhe Cullen, Andrew Hines, N. Harte","doi":"10.1145/2661806.2661808","DOIUrl":"https://doi.org/10.1145/2661806.2661808","url":null,"abstract":"For both individual politicians and political parties the internet has become a vital tool for self-promotion and the distribution of ideas. The rise of streaming has enabled political debates and speeches to reach global audiences. In this paper, we explore the nature of charisma in political speech, with a view to automatic detection. To this end, we have collected a new database of political speech from YouTube and other on-line resources. Annotation is performed both by native listeners, and Amazon Mechanical Turk (AMT) workers. Detailed analysis shows that both label sets are equally reliable. The results support the use of crowd-sourced labels for speaker traits such as charisma in political speech, even where cultural subtleties are present. The impact of these different annotations on charisma prediction from political speech is also investigated.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132840347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661807
M. Valstar, Björn Schuller, Kirsty Smith, Timur R. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic
{"title":"AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge","authors":"M. Valstar, Björn Schuller, Kirsty Smith, Timur R. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic","doi":"10.1145/2661806.2661807","DOIUrl":"https://doi.org/10.1145/2661806.2661807","url":null,"abstract":"Mood disorders are inherently related to emotion. In particular, the behaviour of people suffering from mood disorders such as unipolar depression shows a strong temporal correlation with the affective dimensions valence, arousal and dominance. In addition to structured self-report questionnaires, psychologists and psychiatrists use in their evaluation of a patient's level of depression the observation of facial expressions and vocal cues. It is in this context that we present the fourth Audio-Visual Emotion recognition Challenge (AVEC 2014). This edition of the challenge uses a subset of the tasks used in a previous challenge, allowing for more focussed studies. In addition, labels for a third dimension (Dominance) have been added and the number of annotators per clip has been increased to a minimum of three, with most clips annotated by 5. The challenge has two goals logically organised as sub-challenges: the first is to predict the continuous values of the affective dimensions valence, arousal and dominance at each moment in time. The second is to predict the value of a single self-reported severity of depression indicator for each recording in the dataset. This paper presents the challenge guidelines, the common data used, and the performance of the baseline system on the two tasks.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128667795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661818
V. Mitra, Elizabeth Shriberg, Mitchell McLaren, A. Kathol, Colleen Richey, D. Vergyri, M. Graciarena
{"title":"The SRI AVEC-2014 Evaluation System","authors":"V. Mitra, Elizabeth Shriberg, Mitchell McLaren, A. Kathol, Colleen Richey, D. Vergyri, M. Graciarena","doi":"10.1145/2661806.2661818","DOIUrl":"https://doi.org/10.1145/2661806.2661818","url":null,"abstract":"Though depression is a common mental health problem with significant impact on human society, it often goes undetected. We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale. These features, many of which are novel for this task, include (1) estimated articulatory trajectories during speech production, (2) acoustic characteristics, (3) acoustic-phonetic characteristics and (4) prosodic features. Features are modeled using a variety of approaches, including support vector regression, a Gaussian backend and decision trees. We report results on the AVEC-2014 depression dataset and find that individual systems range from 9.18 to 11.87 in root mean squared error (RMSE), and from 7.68 to 9.99 in mean absolute error (MAE). Initial fusion brings further improvement; fusion and feature selection work is still in progress.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122461419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661819
Mohammed Senoussaoui, Milton Orlando Sarria Paja, J. F. Santos, T. Falk
{"title":"Model Fusion for Multimodal Depression Classification and Level Detection","authors":"Mohammed Senoussaoui, Milton Orlando Sarria Paja, J. F. Santos, T. Falk","doi":"10.1145/2661806.2661819","DOIUrl":"https://doi.org/10.1145/2661806.2661819","url":null,"abstract":"Audio-visual emotion and mood disorder cues have been recently explored to develop tools to assist psychologists and psychiatrists in evaluating a patient's level of depression. In this paper, we present a number of different multimodal depression level predictors using a model fusion approach, in the context of the AVEC14 challenge. We show that an i-vector based representation for short term audio features contains useful information for depression classification and prediction. We also employed a classification step prior to regression to allow having different regression models depending on the presence or absence of depression. Our experiments show that a combination of our audio-based model and two other models based on the LGBP-TOP video features lead to an improvement of 4% over the baseline model proposed by the challenge organizers.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125387372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661815
Humberto Pérez Espinosa, H. Escalante, Luis Villaseñor-Pineda, M. Montes-y-Gómez, David Pinto, Verónica Reyes-Meza
{"title":"Fusing Affective Dimensions and Audio-Visual Features from Segmented Video for Depression Recognition: INAOE-BUAP's Participation at AVEC'14 Challenge","authors":"Humberto Pérez Espinosa, H. Escalante, Luis Villaseñor-Pineda, M. Montes-y-Gómez, David Pinto, Verónica Reyes-Meza","doi":"10.1145/2661806.2661815","DOIUrl":"https://doi.org/10.1145/2661806.2661815","url":null,"abstract":"Depression is a disease that affects a considerable portion of the world population. Severe cases of depression interfere with the common live of patients, for those patients a strict monitoring is necessary in order to control the progress of the disease and to prevent undesired side effects. A way to keep track of patients with depression is by means of online monitoring via human-computer-interaction. The AVEC'14 challenge aims at developing technology towards the online monitoring of depression patients. This paper describes an approach to depression recognition from audiovisual information in the context of the AVEC'14 challenge. The proposed method relies on an effective voice segmentation procedure, followed by segment-level feature extraction and aggregation. Finally, a meta-model is trained to fuse mono-modal information. The main novel features of our proposal are that (1) we use affective dimensions for building depression recognition models; (2) we extract visual information from voice and silence segments separately; (3) we consolidate features and use a meta-model for fusion. The proposed methodology is evaluated, experimental results reveal the method is competitive.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"802 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124173358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661809
J. Williamson, T. Quatieri, Brian S. Helfer, G. Ciccarelli, D. Mehta
{"title":"Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing","authors":"J. Williamson, T. Quatieri, Brian S. Helfer, G. Ciccarelli, D. Mehta","doi":"10.1145/2661806.2661809","DOIUrl":"https://doi.org/10.1145/2661806.2661809","url":null,"abstract":"In individuals with major depressive disorder, neurophysiological changes often alter motor control and thus affect the mechanisms controlling speech production and facial expression. These changes are typically associated with psychomotor retardation, a condition marked by slowed neuromotor output that is behaviorally manifested as altered coordination and timing across multiple motor-based properties. Changes in motor outputs can be inferred from vocal acoustics and facial movements as individuals speak. We derive novel multi-scale correlation structure and timing feature sets from audio-based vocal features and video-based facial action units from recordings provided by the 4th International Audio/Video Emotion Challenge (AVEC). The feature sets enable detection of changes in coordination, movement, and timing of vocal and facial gestures that are potentially symptomatic of depression. Combining complementary features in Gaussian mixture model and extreme learning machine classifiers, our multivariate regression scheme predicts Beck depression inventory ratings on the AVEC test set with a root-mean-square error of 8.12 and mean absolute error of 6.31. Future work calls for continued study into detection of neurological disorders based on altered coordination and timing across audio and video modalities.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126386833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AVEC '14Pub Date : 2014-11-07DOI: 10.1145/2661806.2661813
Markus Kächele, Martin Schels, F. Schwenker
{"title":"Inferring Depression and Affect from Application Dependent Meta Knowledge","authors":"Markus Kächele, Martin Schels, F. Schwenker","doi":"10.1145/2661806.2661813","DOIUrl":"https://doi.org/10.1145/2661806.2661813","url":null,"abstract":"This paper outlines our contribution to the 2014 edition of the AVEC competition. It comprises classification results and considerations for both the continuous affect recognition sub-challenge and also the depression recognition sub-challenge. Rather than relying on statistical features that are normally extracted from the raw audio-visual data we propose an approach based on abstract meta information about individual subjects and also prototypical task and label dependent templates to infer the respective emotional states. The results of the approach that were submitted to both parts of the challenge significantly outperformed the baseline approaches. Further, we elaborate on several issues about the labeling of affective corpora and the choice of appropriate performance measures.","PeriodicalId":318508,"journal":{"name":"AVEC '14","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126062191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}