{"title":"基于卷积神经网络的文本独立语音情感识别","authors":"Seme Sarker, Khadija Akter, Nursadul Mamun","doi":"10.1109/ECCE57851.2023.10101666","DOIUrl":null,"url":null,"abstract":"With the advancement of deep learning approaches, the performance of speech emotion recognition (SER) has shown significant improvements. However, system performance degrades substantially when number of emotional states increased. Therefore, this study proposes a text independent SER system that can classify eight emotional states. The proposed system uses joint Mel frequency cepstral coefficient (MFCC) and Log-Mel spectrogram (LMS) to represent the speech signals and a convolutional neural network (CNN) to classify these features in to different emotional states. Results show that the proposed system can achieve an average accuracy of 93%. Two widely used datasets RAVDSESS and TESS have been used in this work to test the model performance. Experimental results present that the proposed framework can achieve significant improvement using a joint feature of MFCC and LMS. Furthermore, the proposed network outperforms state-of-art networks in terms of classification accuracy. This network could be reliably applied to recognize emotion from speech in naturalistic environment.","PeriodicalId":131537,"journal":{"name":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Text Independent Speech Emotion Recognition Based on Convolutional Neural Network\",\"authors\":\"Seme Sarker, Khadija Akter, Nursadul Mamun\",\"doi\":\"10.1109/ECCE57851.2023.10101666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advancement of deep learning approaches, the performance of speech emotion recognition (SER) has shown significant improvements. However, system performance degrades substantially when number of emotional states increased. Therefore, this study proposes a text independent SER system that can classify eight emotional states. The proposed system uses joint Mel frequency cepstral coefficient (MFCC) and Log-Mel spectrogram (LMS) to represent the speech signals and a convolutional neural network (CNN) to classify these features in to different emotional states. Results show that the proposed system can achieve an average accuracy of 93%. Two widely used datasets RAVDSESS and TESS have been used in this work to test the model performance. Experimental results present that the proposed framework can achieve significant improvement using a joint feature of MFCC and LMS. Furthermore, the proposed network outperforms state-of-art networks in terms of classification accuracy. This network could be reliably applied to recognize emotion from speech in naturalistic environment.\",\"PeriodicalId\":131537,\"journal\":{\"name\":\"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECCE57851.2023.10101666\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECCE57851.2023.10101666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Text Independent Speech Emotion Recognition Based on Convolutional Neural Network
With the advancement of deep learning approaches, the performance of speech emotion recognition (SER) has shown significant improvements. However, system performance degrades substantially when number of emotional states increased. Therefore, this study proposes a text independent SER system that can classify eight emotional states. The proposed system uses joint Mel frequency cepstral coefficient (MFCC) and Log-Mel spectrogram (LMS) to represent the speech signals and a convolutional neural network (CNN) to classify these features in to different emotional states. Results show that the proposed system can achieve an average accuracy of 93%. Two widely used datasets RAVDSESS and TESS have been used in this work to test the model performance. Experimental results present that the proposed framework can achieve significant improvement using a joint feature of MFCC and LMS. Furthermore, the proposed network outperforms state-of-art networks in terms of classification accuracy. This network could be reliably applied to recognize emotion from speech in naturalistic environment.