{"title":"Classifying emotions in human-machine spoken dialogs","authors":"C. Lee, Shrikanth S. Narayanan, R. Pieraccini","doi":"10.1109/ICME.2002.1035887","DOIUrl":null,"url":null,"abstract":"This paper reports on the comparison between various acoustic feature sets and classification algorithms for classifying spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human-machine dialogs obtained from a commercial application. Emotion recognition is posed as a pattern recognition problem. We used three different techniques - linear discriminant classifier (LDC), k-nearest neighborhood (k-NN) classifier, and support vector machine classifier (SVC) -for classifying utterances into 2 emotion classes: negative and non-negative. In this study, two feature sets were used; the base feature set obtained from the utterance-level statistics of the pitch and energy of the speech, and the feature set analyzed by principal component analysis (PCA). PCA showed a performance comparable to the base feature sets. Overall, the LDC achieved the best performance with error rates of 27.54% on female data and 25.46% on males with the base feature set. The SVC, however, showed a better performance in the problem of data sparsity.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"3 1","pages":"737-740 vol.1"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Multimedia and Expo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2002.1035887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 75
Abstract
This paper reports on the comparison between various acoustic feature sets and classification algorithms for classifying spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human-machine dialogs obtained from a commercial application. Emotion recognition is posed as a pattern recognition problem. We used three different techniques - linear discriminant classifier (LDC), k-nearest neighborhood (k-NN) classifier, and support vector machine classifier (SVC) -for classifying utterances into 2 emotion classes: negative and non-negative. In this study, two feature sets were used; the base feature set obtained from the utterance-level statistics of the pitch and energy of the speech, and the feature set analyzed by principal component analysis (PCA). PCA showed a performance comparable to the base feature sets. Overall, the LDC achieved the best performance with error rates of 27.54% on female data and 25.46% on males with the base feature set. The SVC, however, showed a better performance in the problem of data sparsity.