{"title":"A Text Independent Speech Emotion Recognition Based on Convolutional Neural Network","authors":"Seme Sarker, Khadija Akter, Nursadul Mamun","doi":"10.1109/ECCE57851.2023.10101666","DOIUrl":null,"url":null,"abstract":"With the advancement of deep learning approaches, the performance of speech emotion recognition (SER) has shown significant improvements. However, system performance degrades substantially when number of emotional states increased. Therefore, this study proposes a text independent SER system that can classify eight emotional states. The proposed system uses joint Mel frequency cepstral coefficient (MFCC) and Log-Mel spectrogram (LMS) to represent the speech signals and a convolutional neural network (CNN) to classify these features in to different emotional states. Results show that the proposed system can achieve an average accuracy of 93%. Two widely used datasets RAVDSESS and TESS have been used in this work to test the model performance. Experimental results present that the proposed framework can achieve significant improvement using a joint feature of MFCC and LMS. Furthermore, the proposed network outperforms state-of-art networks in terms of classification accuracy. This network could be reliably applied to recognize emotion from speech in naturalistic environment.","PeriodicalId":131537,"journal":{"name":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECCE57851.2023.10101666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the advancement of deep learning approaches, the performance of speech emotion recognition (SER) has shown significant improvements. However, system performance degrades substantially when number of emotional states increased. Therefore, this study proposes a text independent SER system that can classify eight emotional states. The proposed system uses joint Mel frequency cepstral coefficient (MFCC) and Log-Mel spectrogram (LMS) to represent the speech signals and a convolutional neural network (CNN) to classify these features in to different emotional states. Results show that the proposed system can achieve an average accuracy of 93%. Two widely used datasets RAVDSESS and TESS have been used in this work to test the model performance. Experimental results present that the proposed framework can achieve significant improvement using a joint feature of MFCC and LMS. Furthermore, the proposed network outperforms state-of-art networks in terms of classification accuracy. This network could be reliably applied to recognize emotion from speech in naturalistic environment.