{"title":"Session details: Emotion Detection","authors":"Yorgos Tzimiropoulos","doi":"10.1145/3255912","DOIUrl":"https://doi.org/10.1145/3255912","url":null,"abstract":"","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123592661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Sub-Challenge Winners","authors":"M. Valstar","doi":"10.1145/3255913","DOIUrl":"https://doi.org/10.1145/3255913","url":null,"abstract":"","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128825675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Nasir, Arindam Jati, P. G. Shivakumar, Sandeep Nallan Chakravarthula, P. Georgiou
{"title":"Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features","authors":"Md. Nasir, Arindam Jati, P. G. Shivakumar, Sandeep Nallan Chakravarthula, P. Georgiou","doi":"10.1145/2988257.2988261","DOIUrl":"https://doi.org/10.1145/2988257.2988261","url":null,"abstract":"Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate a number of audio and video features for classification with different fusion techniques and temporal contexts. In the audio modality, Teager energy cepstral coefficients~(TECC) outperform standard baseline features; while the best accuracy is achieved with i-vector modelling based on MFCC features. On the other hand, polynomial parameterization of facial landmark features achieves the best performance among all systems and outperforms the best baseline system as well.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131524716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaocheng Huang, Brian Stasak, T. Dang, Kalani Wataraka Gamage, P. Le, V. Sethu, J. Epps
{"title":"Staircase Regression in OA RVM, Data Selection and Gender Dependency in AVEC 2016","authors":"Zhaocheng Huang, Brian Stasak, T. Dang, Kalani Wataraka Gamage, P. Le, V. Sethu, J. Epps","doi":"10.1145/2988257.2988265","DOIUrl":"https://doi.org/10.1145/2988257.2988265","url":null,"abstract":"Within the field of affective computing, human emotion and disorder/disease recognition have progressively attracted more interest in multimodal analysis. This submission to the Depression Classification and Continuous Emotion Prediction challenges for AVEC2016 investigates both, with a focus on audio subsystems. For depression classification, we investigate token word selection, vocal tract coordination parameters computed from spectral centroid features, and gender-dependent classification systems. Token word selection performed very well on the development set. For emotion prediction, we investigate emotionally salient data selection based on emotion change, an output-associative regression approach based on the probabilistic outputs of relevance vector machine classifiers operating on low-high class pairs (OA RVM-SR), and gender-dependent systems. Experimental results from both the development and test sets show that the RVM-SR method under the OA framework can improve on OA RVM, which performed very well in the AV+EC2015 challenge.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115665259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Multimodal Visual Features for Continuous Affect Recognition","authors":"Bo Sun, Siming Cao, Liandong Li, Jun He, Lejun Yu","doi":"10.1145/2988257.2988270","DOIUrl":"https://doi.org/10.1145/2988257.2988270","url":null,"abstract":"This paper presents our work in the Emotion Sub-Challenge of the 6th Audio/Visual Emotion Challenge and Workshop (AVEC 2016), whose goal is to explore utilizing audio, visual and physiological signals to continuously predict the value of the emotion dimensions (arousal and valence). As visual features are very important in emotion recognition, we try a variety of handcrafted and deep visual features. For each video clip, besides the baseline features, we extract multi-scale Dense SIFT features (MSDF), and some types of Convolutional neural networks (CNNs) features to recognize the expression phases of the current frame. We train linear Support Vector Regression (SVR) for every kind of features on the RECOLA dataset. Multimodal fusion of these modalities is then performed with a multiple linear regression model. The final Concordance Correlation Coefficient (CCC) we gained on the development set are 0.824 for arousal, and 0.718 for valence; and on the test set are 0.683 for arousal and 0.642 for valence.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121031307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Depression recognition","authors":"H. Gunes","doi":"10.1145/3255911","DOIUrl":"https://doi.org/10.1145/3255911","url":null,"abstract":"","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"342 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116481053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammadreza Amirian, Markus Kächele, Patrick Thiam, Viktor Kessler, F. Schwenker
{"title":"Continuous Multimodal Human Affect Estimation using Echo State Networks","authors":"Mohammadreza Amirian, Markus Kächele, Patrick Thiam, Viktor Kessler, F. Schwenker","doi":"10.1145/2988257.2988260","DOIUrl":"https://doi.org/10.1145/2988257.2988260","url":null,"abstract":"A continuous multimodal human affect recognition for both arousal and valence dimensions in a non-acted spontaneous scenario is investigated in this paper. Different regression models based on Random Forests and Echo State Networks are evaluated and compared in terms of robustness and accuracy. Moreover, an extension of Echo State Networks to a bi-directional model is introduced to improve the regression accuracy. A hybrid method using Random Forests, Echo State Networks and linear regression fusion is developed and applied on the test subset of the AVEC16 challenge. Finally, the label shift and prediction delay is discussed and an annotator specific regression model, as well as fusion architecture, is proposed for future work.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116785159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal Analysis of Impressions and Personality in Human-Computer and Human-Robot Interactions","authors":"H. Gunes","doi":"10.1145/2988257.2988271","DOIUrl":"https://doi.org/10.1145/2988257.2988271","url":null,"abstract":"This talk will focus on automatic prediction of impressions and inferences about traits and characteristics of people based on their multimodal observable behaviours in the context of human-virtual character and human-robot interactions. The first part of the talk will introduce and describe the creation and evaluation of the MAPTRAITS system that enables on-the-fly prediction of the widely used Big Five personality dimensions (i.e., agreeableness, openness, neuroticism, conscientiousness and extroversion) from a third-vision perspective. A novel approach for sensing and interpreting personality is through a wearable camera that provides a first-person vision (FPV) perspective and therefore enables the acquisition of information about the users' true behaviours and intentions. Accordingly, the second part of the talk will introduce computational analysis of personality traits and interaction experience through first-person vision features in a human-robot interaction context. The perception of personality is also crucial when the interaction takes place over distance. Tele-operated robot avatars, in which an operator's behaviours are portrayed by a robot proxy, have the potential to improve interactions over distance by transforming the perception of physical and social presence, and trust. However, having communication mediated by a robot changes the perception of the operator's appearance, behaviour and personality. The third and last part of the talk will therefore present a study on how robot mediation affects the way the personality of the operator is perceived, analysed and classified, and will discuss the implications our research findings have for autonomous and tele-operated robot design.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127787140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie K. Dagli, T. Quatieri
{"title":"Detecting Depression using Vocal, Facial and Semantic Communication Cues","authors":"J. Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie K. Dagli, T. Quatieri","doi":"10.1145/2988257.2988263","DOIUrl":"https://doi.org/10.1145/2988257.2988263","url":null,"abstract":"Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically-motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"501 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114524603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xingchen Ma, Hongyu Yang, Qiang Chen, Di Huang, Yunhong Wang
{"title":"DepAudioNet: An Efficient Deep Model for Audio based Depression Classification","authors":"Xingchen Ma, Hongyu Yang, Qiang Chen, Di Huang, Yunhong Wang","doi":"10.1145/2988257.2988267","DOIUrl":"https://doi.org/10.1145/2988257.2988267","url":null,"abstract":"This paper presents a novel and effective audio based method on depression classification. It focuses on two important issues, emph{i.e.} data representation and sample imbalance, which are not well addressed in literature. For the former one, in contrast to traditional shallow hand-crafted features, we propose a deep model, namely DepAudioNet, to encode the depression related characteristics in the vocal channel, combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to deliver a more comprehensive audio representation. For the latter one, we introduce a random sampling strategy in the model training phase to balance the positive and negative samples, which largely alleviates the bias caused by uneven sample distribution. Evaluations are carried out on the DAIC-WOZ dataset for the Depression Classification Sub-challenge (DCC) at the 2016 Audio-Visual Emotion Challenge (AVEC), and the experimental results achieved clearly demonstrate the effectiveness of the proposed approach.","PeriodicalId":432793,"journal":{"name":"Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130566884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}