{"title":"A Research of Speech Emotion Recognition Based on CNN Network","authors":"Anurish Gangrade, Shalini Singhal","doi":"10.47904/ijskit.12.1.2022.24-31","DOIUrl":null,"url":null,"abstract":"- This paper proposed a novel method of feature extraction, using DBNs in DNN to automatically extract emotional options from speech signals. Speech emotion recognition relies heavily on feature extraction, which is why the paper focused on this aspect of the problem. Feature extraction is an essential component of the speech emotion recognition process. To extract speech emotion features, we used a 9-layer depth DBN, and we included numerous consecutive frames into the process to produce a high-dimensional feature. An improved CNN model is presented in this article. This model consists of a combination of convolution 1d layers and has been generalized to form a 9-layer architecture of CNN (convolutional neural network). The model accuracy has been checked with respect to emotion classes such as considering 5 emotions such as angry, calm, fearful, happy, and sad for both male and female speakers, and eventually a speech emotion recognition multiple classifier system was achieved. The voice emotion recognition rate of the system achieved 89.00 percent, which is around 14 percent more than the traditional approach could get.","PeriodicalId":424149,"journal":{"name":"SKIT Research Journal","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SKIT Research Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47904/ijskit.12.1.2022.24-31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
- This paper proposed a novel method of feature extraction, using DBNs in DNN to automatically extract emotional options from speech signals. Speech emotion recognition relies heavily on feature extraction, which is why the paper focused on this aspect of the problem. Feature extraction is an essential component of the speech emotion recognition process. To extract speech emotion features, we used a 9-layer depth DBN, and we included numerous consecutive frames into the process to produce a high-dimensional feature. An improved CNN model is presented in this article. This model consists of a combination of convolution 1d layers and has been generalized to form a 9-layer architecture of CNN (convolutional neural network). The model accuracy has been checked with respect to emotion classes such as considering 5 emotions such as angry, calm, fearful, happy, and sad for both male and female speakers, and eventually a speech emotion recognition multiple classifier system was achieved. The voice emotion recognition rate of the system achieved 89.00 percent, which is around 14 percent more than the traditional approach could get.