Xiao Chen, Haifeng Li, Lin Ma, Xinlei Liu, Jing Chen
{"title":"基于Teager Mel和PLP融合特征的语音情感识别","authors":"Xiao Chen, Haifeng Li, Lin Ma, Xinlei Liu, Jing Chen","doi":"10.1109/IMCCC.2015.239","DOIUrl":null,"url":null,"abstract":"Although a number of features derived from linear speech production theory have been investigated as speech emotion indicators, the recognition accuracy still stays unsatisfactory for realistic applications. In this paper, Teager Mel, a novel speech emotion feature is proposed based on Teager Energy Operator (TEO) and the Mel perception characteristics. Due to such advantages as nonlinear and simple, TEO appears to be appropriate for speech emotion description. From the auditory psychophysical point of view, Perceptual Linear Predictive (PLP) features are also investigated as an extension to Teager Mel. A Support Vector Machine (SVM) classifier is then adopted to the fusion of Teager Mel and PLP features on a Chinese discrete emotional speech corpus (Dis-EC) that includes four emotions: happiness, anger, sorrow and surprise. Comparing with the previous studies based on prosodic features, the application of Teager Mel features can achieve a recognition accuracy improvement of 10.4%, and similarly 8.2% for PLP features. The recognition accuracy reaches79.7% while using the fusion features, which appears to be the most attractive in relative researches.","PeriodicalId":438549,"journal":{"name":"2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Teager Mel and PLP Fusion Feature Based Speech Emotion Recognition\",\"authors\":\"Xiao Chen, Haifeng Li, Lin Ma, Xinlei Liu, Jing Chen\",\"doi\":\"10.1109/IMCCC.2015.239\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although a number of features derived from linear speech production theory have been investigated as speech emotion indicators, the recognition accuracy still stays unsatisfactory for realistic applications. In this paper, Teager Mel, a novel speech emotion feature is proposed based on Teager Energy Operator (TEO) and the Mel perception characteristics. Due to such advantages as nonlinear and simple, TEO appears to be appropriate for speech emotion description. From the auditory psychophysical point of view, Perceptual Linear Predictive (PLP) features are also investigated as an extension to Teager Mel. A Support Vector Machine (SVM) classifier is then adopted to the fusion of Teager Mel and PLP features on a Chinese discrete emotional speech corpus (Dis-EC) that includes four emotions: happiness, anger, sorrow and surprise. Comparing with the previous studies based on prosodic features, the application of Teager Mel features can achieve a recognition accuracy improvement of 10.4%, and similarly 8.2% for PLP features. The recognition accuracy reaches79.7% while using the fusion features, which appears to be the most attractive in relative researches.\",\"PeriodicalId\":438549,\"journal\":{\"name\":\"2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC)\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCCC.2015.239\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCCC.2015.239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
摘要
虽然从线性语音产生理论中衍生出的一些特征作为语音情绪指标进行了研究,但在现实应用中,识别精度仍然令人不满意。本文基于Teager能量算子(Teager Energy Operator, TEO)和Mel感知特征,提出了一种新的语音情感特征Teager Mel。TEO具有非线性和简单等优点,适合用于语音情绪描述。从听觉心理物理的角度,知觉线性预测(PLP)特征也作为Teager Mel的延伸进行了研究。然后,采用支持向量机(SVM)分类器在包含快乐、愤怒、悲伤和惊讶四种情绪的汉语离散情感语料库(Dis-EC)上融合Teager Mel和PLP特征。与以往基于韵律特征的研究相比,Teager Mel特征的识别准确率提高了10.4%,PLP特征的识别准确率提高了8.2%。融合特征的识别准确率达到79.7%,是目前相关研究中最具吸引力的。
Teager Mel and PLP Fusion Feature Based Speech Emotion Recognition
Although a number of features derived from linear speech production theory have been investigated as speech emotion indicators, the recognition accuracy still stays unsatisfactory for realistic applications. In this paper, Teager Mel, a novel speech emotion feature is proposed based on Teager Energy Operator (TEO) and the Mel perception characteristics. Due to such advantages as nonlinear and simple, TEO appears to be appropriate for speech emotion description. From the auditory psychophysical point of view, Perceptual Linear Predictive (PLP) features are also investigated as an extension to Teager Mel. A Support Vector Machine (SVM) classifier is then adopted to the fusion of Teager Mel and PLP features on a Chinese discrete emotional speech corpus (Dis-EC) that includes four emotions: happiness, anger, sorrow and surprise. Comparing with the previous studies based on prosodic features, the application of Teager Mel features can achieve a recognition accuracy improvement of 10.4%, and similarly 8.2% for PLP features. The recognition accuracy reaches79.7% while using the fusion features, which appears to be the most attractive in relative researches.