{"title":"语音情感识别中频谱和韵律特征的性能分析及其融合","authors":"Manish Gaurav","doi":"10.1109/SLT.2008.4777903","DOIUrl":null,"url":null,"abstract":"In this paper, we study the performance of different prosody and spectral features of speech on an emotion detection task. In particular, a feature selection algorithm has been used to assess the relevancy of the different features. Gaussian mixtures models have been used to model the features extracted at the frame-level, while support vector machines (SVM) and k-nearest neighbor (k-NN) methods have been used to model the features extracted at the utterance level. We use a normalization approach (T-norm) to combine the scores from the different models. The results using the above approach are reported for the Berlin emotional database corpus and the task consisted of classifying the six emotions namely - anger, happiness, neutral, sadness, boredom and anxiety. We show that the use of feature selection algorithm improves the result, while in addition the fusion of GMM and SVM results in an overall accuracy of 75.4% for the above task.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech\",\"authors\":\"Manish Gaurav\",\"doi\":\"10.1109/SLT.2008.4777903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the performance of different prosody and spectral features of speech on an emotion detection task. In particular, a feature selection algorithm has been used to assess the relevancy of the different features. Gaussian mixtures models have been used to model the features extracted at the frame-level, while support vector machines (SVM) and k-nearest neighbor (k-NN) methods have been used to model the features extracted at the utterance level. We use a normalization approach (T-norm) to combine the scores from the different models. The results using the above approach are reported for the Berlin emotional database corpus and the task consisted of classifying the six emotions namely - anger, happiness, neutral, sadness, boredom and anxiety. We show that the use of feature selection algorithm improves the result, while in addition the fusion of GMM and SVM results in an overall accuracy of 75.4% for the above task.\",\"PeriodicalId\":186876,\"journal\":{\"name\":\"2008 IEEE Spoken Language Technology Workshop\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE Spoken Language Technology Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2008.4777903\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE Spoken Language Technology Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2008.4777903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech
In this paper, we study the performance of different prosody and spectral features of speech on an emotion detection task. In particular, a feature selection algorithm has been used to assess the relevancy of the different features. Gaussian mixtures models have been used to model the features extracted at the frame-level, while support vector machines (SVM) and k-nearest neighbor (k-NN) methods have been used to model the features extracted at the utterance level. We use a normalization approach (T-norm) to combine the scores from the different models. The results using the above approach are reported for the Berlin emotional database corpus and the task consisted of classifying the six emotions namely - anger, happiness, neutral, sadness, boredom and anxiety. We show that the use of feature selection algorithm improves the result, while in addition the fusion of GMM and SVM results in an overall accuracy of 75.4% for the above task.