Joint Discrete and Continuous Emotion Prediction Using Ensemble and End-to-End Approaches

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI:10.1145/3242969.3242972

Ehab Albadawy, Yelin Kim

{"title":"Joint Discrete and Continuous Emotion Prediction Using Ensemble and End-to-End Approaches","authors":"Ehab Albadawy, Yelin Kim","doi":"10.1145/3242969.3242972","DOIUrl":null,"url":null,"abstract":"This paper presents a novel approach in continuous emotion prediction that characterizes dimensional emotion labels jointly with continuous and discretized representations. Continuous emotion labels can capture subtle emotion variations, but their inherent noise often has negative effects on model training. Recent approaches found a performance gain when converting the continuous labels into a discrete set (e.g., using k-means clustering), despite a label quantization error. To find the optimal trade-off between the continuous and discretized emotion representations, we investigate two joint modeling approaches: ensemble and end-to-end. The ensemble model combines the predictions from two models that are trained separately, one with discretized prediction and the other with continuous prediction. On the other hand, the end-to-end model is trained to simultaneously optimize both discretized and continuous prediction tasks in addition to the final combination between them. Our experimental results using the state-of-the-art deep BLSTM network on the RECOLA dataset demonstrate that (i) the joint representation outperforms both individual representation baselines and the state-of-the-art speech based results on RECOLA, validating the assumption that combining continuous and discretized emotion representations yields better performance in emotion prediction; and (ii) the joint representation can help to accelerate convergence, particularly for valence prediction. Our work provides insights into joint discrete and continuous emotion representation and its efficacy for describing dynamically changing affective behavior in valence and activation prediction.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"130 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3242972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

This paper presents a novel approach in continuous emotion prediction that characterizes dimensional emotion labels jointly with continuous and discretized representations. Continuous emotion labels can capture subtle emotion variations, but their inherent noise often has negative effects on model training. Recent approaches found a performance gain when converting the continuous labels into a discrete set (e.g., using k-means clustering), despite a label quantization error. To find the optimal trade-off between the continuous and discretized emotion representations, we investigate two joint modeling approaches: ensemble and end-to-end. The ensemble model combines the predictions from two models that are trained separately, one with discretized prediction and the other with continuous prediction. On the other hand, the end-to-end model is trained to simultaneously optimize both discretized and continuous prediction tasks in addition to the final combination between them. Our experimental results using the state-of-the-art deep BLSTM network on the RECOLA dataset demonstrate that (i) the joint representation outperforms both individual representation baselines and the state-of-the-art speech based results on RECOLA, validating the assumption that combining continuous and discretized emotion representations yields better performance in emotion prediction; and (ii) the joint representation can help to accelerate convergence, particularly for valence prediction. Our work provides insights into joint discrete and continuous emotion representation and its efficacy for describing dynamically changing affective behavior in valence and activation prediction.

查看原文本刊更多论文

基于集成和端到端方法的联合离散和连续情绪预测

本文提出了一种新的连续情绪预测方法，该方法将连续和离散表征相结合来表征维度情绪标签。连续的情绪标签可以捕捉细微的情绪变化，但其固有的噪声往往对模型训练产生负面影响。最近的方法发现，在将连续标签转换为离散集(例如，使用k-means聚类)时，尽管存在标签量化误差，但性能有所提高。为了找到连续和离散情感表征之间的最佳权衡，我们研究了两种联合建模方法:集成和端到端。集成模型结合了两个单独训练的模型的预测，一个是离散预测，另一个是连续预测。另一方面，训练端到端模型，以同时优化离散化和连续化预测任务以及两者的最终组合。我们在RECOLA数据集上使用最先进的深度BLSTM网络的实验结果表明:(i)联合表示优于个体表示基线和基于RECOLA最先进语音的结果，验证了连续和离散情感表示相结合在情感预测中产生更好性能的假设;(ii)联合表示有助于加速收敛，特别是对于价预测。我们的工作提供了对联合离散和连续情绪表征及其在效价和激活预测中描述动态变化的情感行为的功效的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 20th ACM International Conference on Multimodal Interaction

自引率

0.00%

发文量