{"title":"基于MFCC特征的神经网络语音情感识别","authors":"Harshit Dolka, Arul Xavier V M, S. Juliet","doi":"10.1109/ICSPC51351.2021.9451810","DOIUrl":null,"url":null,"abstract":"Speech Emotion Recognition (SER) is one of the active research topics in Human-Computer Interaction. This paper focuses on training an ANN Model for SER using Mel Frequency Cepstral Coefficients (MFCCs) feature extraction and training it on selected audio datasets to compare the performance. The model can classify audio files based on a total of eight emotional states: happy, sad, angry, surprise, disgust, calm and neutral, although the number of emotions varies in selected datasets. The proposed model gives an average accuracy of 99.52% on the TESS data set, 88.72% on the RAVDESS data set, 71.69% on the CREMA data set, and 86.80% on the SAVEE data set.","PeriodicalId":182885,"journal":{"name":"2021 3rd International Conference on Signal Processing and Communication (ICPSC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Speech Emotion Recognition Using ANN on MFCC Features\",\"authors\":\"Harshit Dolka, Arul Xavier V M, S. Juliet\",\"doi\":\"10.1109/ICSPC51351.2021.9451810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech Emotion Recognition (SER) is one of the active research topics in Human-Computer Interaction. This paper focuses on training an ANN Model for SER using Mel Frequency Cepstral Coefficients (MFCCs) feature extraction and training it on selected audio datasets to compare the performance. The model can classify audio files based on a total of eight emotional states: happy, sad, angry, surprise, disgust, calm and neutral, although the number of emotions varies in selected datasets. The proposed model gives an average accuracy of 99.52% on the TESS data set, 88.72% on the RAVDESS data set, 71.69% on the CREMA data set, and 86.80% on the SAVEE data set.\",\"PeriodicalId\":182885,\"journal\":{\"name\":\"2021 3rd International Conference on Signal Processing and Communication (ICPSC)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 3rd International Conference on Signal Processing and Communication (ICPSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPC51351.2021.9451810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Signal Processing and Communication (ICPSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPC51351.2021.9451810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech Emotion Recognition Using ANN on MFCC Features
Speech Emotion Recognition (SER) is one of the active research topics in Human-Computer Interaction. This paper focuses on training an ANN Model for SER using Mel Frequency Cepstral Coefficients (MFCCs) feature extraction and training it on selected audio datasets to compare the performance. The model can classify audio files based on a total of eight emotional states: happy, sad, angry, surprise, disgust, calm and neutral, although the number of emotions varies in selected datasets. The proposed model gives an average accuracy of 99.52% on the TESS data set, 88.72% on the RAVDESS data set, 71.69% on the CREMA data set, and 86.80% on the SAVEE data set.