N. Kamaruddin, Abdul Wahab Abdul Rahman, N. Abdullah
{"title":"Speech emotion identification analysis based on different spectral feature extraction methods","authors":"N. Kamaruddin, Abdul Wahab Abdul Rahman, N. Abdullah","doi":"10.1109/ICT4M.2014.7020588","DOIUrl":null,"url":null,"abstract":"Human speech communication will convey semantic information of the uttered word as well as the underlying emotion information of the interlocutor. Emotion identification is important, as it could enhance many applications added-features that can improve human computer interaction aspect. Such improvement surely can help to retain customer satisfaction and loyalty in the long run and serves as an attraction factor for a new customer. Although many researchers have used many approaches to recognize emotion from speech, no one can claim superiority of their findings. This is because different feature extraction methods coupled with various classifiers may produce different performance depending on the data used. This paper presents a comparative analysis of the speech emotion identification system using two different feature extraction methods of Mel Frequency Cepstral Coefficient (MFCC) and Linear Prediction Coefficient (LPC) coupled with Multilayer Perceptron (MLP) classifier. For further exploration, different numbers of MFCC filters are employed to observe the performance of the proposed system. The results indicate that MFCC-40 gives slightly better performance compared to the other MFCC coefficients in the Berlin EMO-DB and NTU_American whereas the MFCC-20 performs well for NTU_Asian. It is also observed that MFCC consistently performed better than LPC in all experiments, which are in-line with many reported findings. Such understanding can be extended to further study speech emotion in order to develop more robust and least error system in the future.","PeriodicalId":327033,"journal":{"name":"The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 5th International Conference on Information and Communication Technology for The Muslim World (ICT4M)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICT4M.2014.7020588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Human speech communication will convey semantic information of the uttered word as well as the underlying emotion information of the interlocutor. Emotion identification is important, as it could enhance many applications added-features that can improve human computer interaction aspect. Such improvement surely can help to retain customer satisfaction and loyalty in the long run and serves as an attraction factor for a new customer. Although many researchers have used many approaches to recognize emotion from speech, no one can claim superiority of their findings. This is because different feature extraction methods coupled with various classifiers may produce different performance depending on the data used. This paper presents a comparative analysis of the speech emotion identification system using two different feature extraction methods of Mel Frequency Cepstral Coefficient (MFCC) and Linear Prediction Coefficient (LPC) coupled with Multilayer Perceptron (MLP) classifier. For further exploration, different numbers of MFCC filters are employed to observe the performance of the proposed system. The results indicate that MFCC-40 gives slightly better performance compared to the other MFCC coefficients in the Berlin EMO-DB and NTU_American whereas the MFCC-20 performs well for NTU_Asian. It is also observed that MFCC consistently performed better than LPC in all experiments, which are in-line with many reported findings. Such understanding can be extended to further study speech emotion in order to develop more robust and least error system in the future.