{"title":"基于状态空间模型的动态语音情感识别","authors":"K. Markov, T. Matsui, F. Septier, G. Peters","doi":"10.1109/EUSIPCO.2015.7362750","DOIUrl":null,"url":null,"abstract":"Automatic emotion recognition from speech has been focused mainly on identifying categorical or static affect states, but the spectrum of human emotion is continuous and time-varying. In this paper, we present a recognition system for dynamic speech emotion based on state-space models (SSMs). The prediction of the unknown emotion trajectory in the affect space spanned by Arousal, Valence, and Dominance (A-V-D) descriptors is cast as a time series filtering task. The state space models we investigated include a standard linear model (Kalman filter) as well as novel non-linear, non-parametric Gaussian Processes (GP) based SSM. We use the AVEC 2014 database for evaluation, which provides ground truth A-V-D labels which allows state and measurement functions to be learned separately simplifying the model training. For the filtering with GP SSM, we used two approximation methods: a recently proposed analytic method and Particle filter. All models were evaluated in terms of average Pearson correlation R and root mean square error (RMSE). The results show that using the same feature vectors, the GP SSMs achieve twice higher correlation and twice smaller RMSE than a Kalman filter.","PeriodicalId":401040,"journal":{"name":"2015 23rd European Signal Processing Conference (EUSIPCO)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Dynamic speech emotion recognition with state-space models\",\"authors\":\"K. Markov, T. Matsui, F. Septier, G. Peters\",\"doi\":\"10.1109/EUSIPCO.2015.7362750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic emotion recognition from speech has been focused mainly on identifying categorical or static affect states, but the spectrum of human emotion is continuous and time-varying. In this paper, we present a recognition system for dynamic speech emotion based on state-space models (SSMs). The prediction of the unknown emotion trajectory in the affect space spanned by Arousal, Valence, and Dominance (A-V-D) descriptors is cast as a time series filtering task. The state space models we investigated include a standard linear model (Kalman filter) as well as novel non-linear, non-parametric Gaussian Processes (GP) based SSM. We use the AVEC 2014 database for evaluation, which provides ground truth A-V-D labels which allows state and measurement functions to be learned separately simplifying the model training. For the filtering with GP SSM, we used two approximation methods: a recently proposed analytic method and Particle filter. All models were evaluated in terms of average Pearson correlation R and root mean square error (RMSE). The results show that using the same feature vectors, the GP SSMs achieve twice higher correlation and twice smaller RMSE than a Kalman filter.\",\"PeriodicalId\":401040,\"journal\":{\"name\":\"2015 23rd European Signal Processing Conference (EUSIPCO)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 23rd European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EUSIPCO.2015.7362750\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 23rd European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUSIPCO.2015.7362750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
摘要
语音情感的自动识别主要集中在识别分类或静态情感状态,但人类情感的频谱是连续的和时变的。本文提出了一种基于状态空间模型的动态语音情感识别系统。唤醒、效价和优势(a - v - d)描述符在情感空间中对未知情绪轨迹的预测是一个时间序列过滤任务。我们研究的状态空间模型包括标准线性模型(卡尔曼滤波)以及基于非线性、非参数高斯过程(GP)的SSM。我们使用AVEC 2014数据库进行评估,该数据库提供了ground truth A-V-D标签,允许分别学习状态和测量函数,简化了模型训练。对于GP SSM的滤波,我们使用了两种近似方法:最近提出的解析法和粒子滤波。所有模型均以平均Pearson相关R和均方根误差(RMSE)进行评估。结果表明,使用相同的特征向量,GP ssm的相关性比卡尔曼滤波高两倍,RMSE比卡尔曼滤波小两倍。
Dynamic speech emotion recognition with state-space models
Automatic emotion recognition from speech has been focused mainly on identifying categorical or static affect states, but the spectrum of human emotion is continuous and time-varying. In this paper, we present a recognition system for dynamic speech emotion based on state-space models (SSMs). The prediction of the unknown emotion trajectory in the affect space spanned by Arousal, Valence, and Dominance (A-V-D) descriptors is cast as a time series filtering task. The state space models we investigated include a standard linear model (Kalman filter) as well as novel non-linear, non-parametric Gaussian Processes (GP) based SSM. We use the AVEC 2014 database for evaluation, which provides ground truth A-V-D labels which allows state and measurement functions to be learned separately simplifying the model training. For the filtering with GP SSM, we used two approximation methods: a recently proposed analytic method and Particle filter. All models were evaluated in terms of average Pearson correlation R and root mean square error (RMSE). The results show that using the same feature vectors, the GP SSMs achieve twice higher correlation and twice smaller RMSE than a Kalman filter.