Amiya Samantaray, K. Mahapatra, Bibek Kabi, A. Routray
{"title":"A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages","authors":"Amiya Samantaray, K. Mahapatra, Bibek Kabi, A. Routray","doi":"10.1109/ReTIS.2015.7232907","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding Coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic Coefficients (MEDC)) for robust automatic recognition of speaker's state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in `Five native Assamese Languages'. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.","PeriodicalId":161306,"journal":{"name":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReTIS.2015.7232907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
Speech emotion recognition is one of the recent challenges in speech processing and Human Computer Interaction (HCI) in order to address various operational needs for the real world applications. Besides human facial expressions, speech has been proven to be one of the most precious modalities for automatic recognition of human emotions. Speech is a spontaneous medium of perceiving emotions which provides in-depth information related to different cognitive states of a human being. In this context, a novel approach is being introduces using a combination of prosody features (i.e. pitch, energy, Zero crossing rate), quality features (i.e. Formant Frequencies, Spectral features etc.), derived features (i.e. Mel-Frequency Cepstral Coefficient (MFCC), Linear Predictive Coding Coefficients (LPCC)) and dynamic feature (Mel-Energy spectrum dynamic Coefficients (MEDC)) for robust automatic recognition of speaker's state of emotion. Multilevel SVM classifier is used for identification of seven discrete emotional states namely anger, disgust, fear, happy, neutral, sad and surprise in `Five native Assamese Languages'. The overall results of the conducted experiments revealed that the approach of using the combination of features achieved an average accuracy rate of 82.26% for speaker independent cases.