Towards a Better Gold Standard: Denoising and Modelling Continuous Emotion Annotations Based on Feature Agglomeration and Outlier Regularisation

Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop Pub Date : 2018-10-15 DOI:10.1145/3266302.3266307

Chen Wang, Phil Lopes, T. Pun, G. Chanel

{"title":"Towards a Better Gold Standard: Denoising and Modelling Continuous Emotion Annotations Based on Feature Agglomeration and Outlier Regularisation","authors":"Chen Wang, Phil Lopes, T. Pun, G. Chanel","doi":"10.1145/3266302.3266307","DOIUrl":null,"url":null,"abstract":"Emotions are often perceived by humans through a series of multimodal cues, such as verbal expressions, facial expressions and gestures. In order to recognise emotions automatically, reliable emotional labels are required to learn a mapping from human expressions to corresponding emotions. Dimensional emotion models have become popular and have been widely applied for annotating emotions continuously in the time domain. However, the statistical relationship between emotional dimensions is rarely studied. This paper provides a solution to automatic emotion recognition for the Audio/Visual Emotion Challenge (AVEC) 2018. The objective is to find a robust way to detect emotions using more reliable emotion annotations in the valence and arousal dimensions. The two main contributions of this paper are: 1) the proposal of a new approach capable of generating more dependable emotional ratings for both arousal and valence from multiple annotators by extracting consistent annotation features; 2) the exploration of the valence and arousal distribution using outlier detection methods, which shows a specific oblique elliptic shape. With the learned distribution, we are able to detect the prediction outliers based on their local density deviations and correct them towards the learned distribution. The proposed method performance is evaluated on the RECOLA database containing audio, video and physiological recordings. Our results show that a moving average filter is sufficient to remove the incidental errors in annotations. The unsupervised dimensionality reduction approaches could be used to determine a gold standard annotations from multiple annotations. Compared with the baseline model of AVEC 2018, our approach improved the arousal and valence prediction of concordance correlation coefficient significantly to respectively 0.821 and 0.589.","PeriodicalId":123523,"journal":{"name":"Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop","volume":"130 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3266302.3266307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Emotions are often perceived by humans through a series of multimodal cues, such as verbal expressions, facial expressions and gestures. In order to recognise emotions automatically, reliable emotional labels are required to learn a mapping from human expressions to corresponding emotions. Dimensional emotion models have become popular and have been widely applied for annotating emotions continuously in the time domain. However, the statistical relationship between emotional dimensions is rarely studied. This paper provides a solution to automatic emotion recognition for the Audio/Visual Emotion Challenge (AVEC) 2018. The objective is to find a robust way to detect emotions using more reliable emotion annotations in the valence and arousal dimensions. The two main contributions of this paper are: 1) the proposal of a new approach capable of generating more dependable emotional ratings for both arousal and valence from multiple annotators by extracting consistent annotation features; 2) the exploration of the valence and arousal distribution using outlier detection methods, which shows a specific oblique elliptic shape. With the learned distribution, we are able to detect the prediction outliers based on their local density deviations and correct them towards the learned distribution. The proposed method performance is evaluated on the RECOLA database containing audio, video and physiological recordings. Our results show that a moving average filter is sufficient to remove the incidental errors in annotations. The unsupervised dimensionality reduction approaches could be used to determine a gold standard annotations from multiple annotations. Compared with the baseline model of AVEC 2018, our approach improved the arousal and valence prediction of concordance correlation coefficient significantly to respectively 0.821 and 0.589.

查看原文本刊更多论文

迈向更好的黄金标准:基于特征聚集和离群值正则化的连续情感注释去噪和建模

人类通常通过一系列多模态线索来感知情绪，例如语言表达、面部表情和手势。为了自动识别情绪，需要可靠的情绪标签来学习从人类表情到相应情绪的映射。维度情绪模型已成为一种流行的方法，并被广泛应用于在时间域上对情绪进行连续注释。然而，情感维度之间的统计关系却很少被研究。本文为2018视听情感挑战赛(AVEC)提供了一种自动情感识别的解决方案。目的是寻找一种鲁棒的方法，在效价和唤醒维度上使用更可靠的情绪注释来检测情绪。本文的两个主要贡献是:1)通过提取一致的注释特征，提出了一种能够从多个注释者中生成更可靠的唤醒和效价情绪评级的新方法;2)利用离群值检测方法对效价和激振分布进行探索，该分布呈现出特定的斜椭圆形状。利用学习分布，我们可以根据预测异常点的局部密度偏差来检测预测异常点，并将其修正为学习分布。在包含音频、视频和生理记录的RECOLA数据库上对该方法的性能进行了评估。结果表明，移动平均滤波器足以去除注释中的附带错误。无监督降维方法可用于从多个注释中确定一个金标准注释。与AVEC 2018的基线模型相比，我们的方法显著提高了一致性相关系数的唤醒和效价预测，分别达到0.821和0.589。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop

自引率

0.00%

发文量