Classifying emotions in human-machine spoken dialogs

Proceedings. IEEE International Conference on Multimedia and Expo Pub Date : 2002-11-07 DOI:10.1109/ICME.2002.1035887

C. Lee, Shrikanth S. Narayanan, R. Pieraccini

{"title":"Classifying emotions in human-machine spoken dialogs","authors":"C. Lee, Shrikanth S. Narayanan, R. Pieraccini","doi":"10.1109/ICME.2002.1035887","DOIUrl":null,"url":null,"abstract":"This paper reports on the comparison between various acoustic feature sets and classification algorithms for classifying spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human-machine dialogs obtained from a commercial application. Emotion recognition is posed as a pattern recognition problem. We used three different techniques - linear discriminant classifier (LDC), k-nearest neighborhood (k-NN) classifier, and support vector machine classifier (SVC) -for classifying utterances into 2 emotion classes: negative and non-negative. In this study, two feature sets were used; the base feature set obtained from the utterance-level statistics of the pitch and energy of the speech, and the feature set analyzed by principal component analysis (PCA). PCA showed a performance comparable to the base feature sets. Overall, the LDC achieved the best performance with error rates of 27.54% on female data and 25.46% on males with the base feature set. The SVC, however, showed a better performance in the problem of data sparsity.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"3 1","pages":"737-740 vol.1"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Multimedia and Expo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2002.1035887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 75

Abstract

This paper reports on the comparison between various acoustic feature sets and classification algorithms for classifying spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human-machine dialogs obtained from a commercial application. Emotion recognition is posed as a pattern recognition problem. We used three different techniques - linear discriminant classifier (LDC), k-nearest neighborhood (k-NN) classifier, and support vector machine classifier (SVC) -for classifying utterances into 2 emotion classes: negative and non-negative. In this study, two feature sets were used; the base feature set obtained from the utterance-level statistics of the pitch and energy of the speech, and the feature set analyzed by principal component analysis (PCA). PCA showed a performance comparable to the base feature sets. Overall, the LDC achieved the best performance with error rates of 27.54% on female data and 25.46% on males with the base feature set. The SVC, however, showed a better performance in the problem of data sparsity.

查看原文本刊更多论文

对人机对话中的情绪进行分类

本文报道了基于说话人情绪状态对语音进行分类的各种声学特征集和分类算法的比较。用于分析的数据集来自从商业应用程序获得的人机对话语料库。情绪识别是一个模式识别问题。我们使用了三种不同的技术——线性判别分类器(LDC)、k近邻分类器(k-NN)和支持向量机分类器(SVC)——将话语分为两类:消极和非消极。在本研究中，使用了两个特征集;从语音的音高和能量的话语级统计得到基本特征集，并通过主成分分析(PCA)对特征集进行分析。主成分分析显示了与基本特征集相当的性能。总体而言，LDC在基本特征集上对女性数据的错误率为27.54%，对男性数据的错误率为25.46%，达到了最佳性能。然而，SVC在数据稀疏性问题上表现出更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE International Conference on Multimedia and Expo

自引率

0.00%

发文量