Muhammad Yusup Zakaria, E. C. Djamal, Fikri Nugraha, Fatan Kasyidi
{"title":"基于线性预测编码和递归神经网络的语音情绪识别","authors":"Muhammad Yusup Zakaria, E. C. Djamal, Fikri Nugraha, Fatan Kasyidi","doi":"10.1109/IC2IE50715.2020.9274629","DOIUrl":null,"url":null,"abstract":"Social, affective communication in recent years shows significant developments, especially in the verbal understanding of emotions. Human connection naturally adjusts to their responses based on the actions of their interlocutor in a particular matter. Previous research has shown that the use of neural network architecture can identify emotions based on speech, but the results of accuracy are not good due to the imbalance of data and problems with the design of the classification system. This study uses Linear Predictive Coding (LPC). LPC can represent the pronunciation of one’s dialogue. From 16 coefficient LPC is used as a vector feature as input for voice emotion identification using Recurrent Neural Network (RNN). Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU) architecture is used to overcome vanishing or exploding gradient. At the identification stage, that uses forward propagation with a softmax activation function. We have conducted a simulation using RNN as a method for making emotional identification. The results of this study RNNGRU using Adam optimization model with a learning rate of 0.001 get an accuracy of 90.93% and a losses value of 0.216. In comparison, the RNN-LSTM got an accuracy of 87.50% and losses value of 0.262. The experimental results show that the best model is achieved when using the RNN-GRU with the Adam optimization method. The F-Measure value obtained is 0.91.","PeriodicalId":211983,"journal":{"name":"2020 3rd International Conference on Computer and Informatics Engineering (IC2IE)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech Emotion Identification Using Linear Predictive Coding and Recurrent Neural\",\"authors\":\"Muhammad Yusup Zakaria, E. C. Djamal, Fikri Nugraha, Fatan Kasyidi\",\"doi\":\"10.1109/IC2IE50715.2020.9274629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social, affective communication in recent years shows significant developments, especially in the verbal understanding of emotions. Human connection naturally adjusts to their responses based on the actions of their interlocutor in a particular matter. Previous research has shown that the use of neural network architecture can identify emotions based on speech, but the results of accuracy are not good due to the imbalance of data and problems with the design of the classification system. This study uses Linear Predictive Coding (LPC). LPC can represent the pronunciation of one’s dialogue. From 16 coefficient LPC is used as a vector feature as input for voice emotion identification using Recurrent Neural Network (RNN). Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU) architecture is used to overcome vanishing or exploding gradient. At the identification stage, that uses forward propagation with a softmax activation function. We have conducted a simulation using RNN as a method for making emotional identification. The results of this study RNNGRU using Adam optimization model with a learning rate of 0.001 get an accuracy of 90.93% and a losses value of 0.216. In comparison, the RNN-LSTM got an accuracy of 87.50% and losses value of 0.262. The experimental results show that the best model is achieved when using the RNN-GRU with the Adam optimization method. The F-Measure value obtained is 0.91.\",\"PeriodicalId\":211983,\"journal\":{\"name\":\"2020 3rd International Conference on Computer and Informatics Engineering (IC2IE)\",\"volume\":\"131 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 3rd International Conference on Computer and Informatics Engineering (IC2IE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC2IE50715.2020.9274629\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Computer and Informatics Engineering (IC2IE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2IE50715.2020.9274629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech Emotion Identification Using Linear Predictive Coding and Recurrent Neural
Social, affective communication in recent years shows significant developments, especially in the verbal understanding of emotions. Human connection naturally adjusts to their responses based on the actions of their interlocutor in a particular matter. Previous research has shown that the use of neural network architecture can identify emotions based on speech, but the results of accuracy are not good due to the imbalance of data and problems with the design of the classification system. This study uses Linear Predictive Coding (LPC). LPC can represent the pronunciation of one’s dialogue. From 16 coefficient LPC is used as a vector feature as input for voice emotion identification using Recurrent Neural Network (RNN). Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU) architecture is used to overcome vanishing or exploding gradient. At the identification stage, that uses forward propagation with a softmax activation function. We have conducted a simulation using RNN as a method for making emotional identification. The results of this study RNNGRU using Adam optimization model with a learning rate of 0.001 get an accuracy of 90.93% and a losses value of 0.216. In comparison, the RNN-LSTM got an accuracy of 87.50% and losses value of 0.262. The experimental results show that the best model is achieved when using the RNN-GRU with the Adam optimization method. The F-Measure value obtained is 0.91.