Speech emotion identification analysis based on different spectral feature extraction methods

The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M) Pub Date : 2014-01-23 DOI:10.1109/ICT4M.2014.7020588

N. Kamaruddin, Abdul Wahab Abdul Rahman, N. Abdullah

{"title":"Speech emotion identification analysis based on different spectral feature extraction methods","authors":"N. Kamaruddin, Abdul Wahab Abdul Rahman, N. Abdullah","doi":"10.1109/ICT4M.2014.7020588","DOIUrl":null,"url":null,"abstract":"Human speech communication will convey semantic information of the uttered word as well as the underlying emotion information of the interlocutor. Emotion identification is important, as it could enhance many applications added-features that can improve human computer interaction aspect. Such improvement surely can help to retain customer satisfaction and loyalty in the long run and serves as an attraction factor for a new customer. Although many researchers have used many approaches to recognize emotion from speech, no one can claim superiority of their findings. This is because different feature extraction methods coupled with various classifiers may produce different performance depending on the data used. This paper presents a comparative analysis of the speech emotion identification system using two different feature extraction methods of Mel Frequency Cepstral Coefficient (MFCC) and Linear Prediction Coefficient (LPC) coupled with Multilayer Perceptron (MLP) classifier. For further exploration, different numbers of MFCC filters are employed to observe the performance of the proposed system. The results indicate that MFCC-40 gives slightly better performance compared to the other MFCC coefficients in the Berlin EMO-DB and NTU_American whereas the MFCC-20 performs well for NTU_Asian. It is also observed that MFCC consistently performed better than LPC in all experiments, which are in-line with many reported findings. Such understanding can be extended to further study speech emotion in order to develop more robust and least error system in the future.","PeriodicalId":327033,"journal":{"name":"The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICT4M.2014.7020588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Human speech communication will convey semantic information of the uttered word as well as the underlying emotion information of the interlocutor. Emotion identification is important, as it could enhance many applications added-features that can improve human computer interaction aspect. Such improvement surely can help to retain customer satisfaction and loyalty in the long run and serves as an attraction factor for a new customer. Although many researchers have used many approaches to recognize emotion from speech, no one can claim superiority of their findings. This is because different feature extraction methods coupled with various classifiers may produce different performance depending on the data used. This paper presents a comparative analysis of the speech emotion identification system using two different feature extraction methods of Mel Frequency Cepstral Coefficient (MFCC) and Linear Prediction Coefficient (LPC) coupled with Multilayer Perceptron (MLP) classifier. For further exploration, different numbers of MFCC filters are employed to observe the performance of the proposed system. The results indicate that MFCC-40 gives slightly better performance compared to the other MFCC coefficients in the Berlin EMO-DB and NTU_American whereas the MFCC-20 performs well for NTU_Asian. It is also observed that MFCC consistently performed better than LPC in all experiments, which are in-line with many reported findings. Such understanding can be extended to further study speech emotion in order to develop more robust and least error system in the future.

查看原文本刊更多论文

基于不同频谱特征提取方法的语音情感识别分析

人类的言语交流既传递着话语的语义信息，也传递着对话者潜在的情感信息。情感识别是重要的，因为它可以增强许多应用程序的附加功能，可以改善人机交互方面。从长远来看，这种改进肯定有助于保持客户的满意度和忠诚度，并成为吸引新客户的因素。尽管许多研究人员使用了许多方法来从语言中识别情感，但没有人能声称他们的发现具有优越性。这是因为不同的特征提取方法加上不同的分类器可能会根据所使用的数据产生不同的性能。本文采用多层感知器(MLP)分类器结合Mel频率倒谱系数(MFCC)和线性预测系数(LPC)两种不同的特征提取方法，对语音情感识别系统进行了对比分析。为了进一步探索，我们使用了不同数量的MFCC滤波器来观察所提出系统的性能。结果表明，与其他MFCC系数相比，MFCC-40在柏林EMO-DB和NTU_American的性能略好，而MFCC-20在NTU_Asian的性能较好。我们还观察到MFCC在所有实验中始终优于LPC，这与许多报道的发现一致。这种认识可以扩展到对语音情感的进一步研究，以便在未来开发出更鲁棒、误差更小的系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M)

自引率

0.00%

发文量