{"title":"机器学习算法在语音情感识别中的应用","authors":"Junyi Cao","doi":"10.1109/CONF-SPML54095.2021.00031","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition has been widely used in recent years and has become a heated topic for research. Focused on the convolutional neural network model using spectrograms as input, the CNN-LSTM model based on feature vectors, original speech signal and Log-mel spectrograms, the performance of different models is compared as well as analyzed. The study found that there are some common problems existing in the classification performance of the model. The features and algorithms currently used can effectively distinguish emotions with varied “arousal”, but it is difficult to identify the feelings with similar arousal, among the models. The CNN-LSTM model with Log-mel spectrograms as input achieved the highest accuracy.","PeriodicalId":415094,"journal":{"name":"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)","volume":"388 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of Machine Learning Algorithms in Speech Emotion Recognition\",\"authors\":\"Junyi Cao\",\"doi\":\"10.1109/CONF-SPML54095.2021.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech emotion recognition has been widely used in recent years and has become a heated topic for research. Focused on the convolutional neural network model using spectrograms as input, the CNN-LSTM model based on feature vectors, original speech signal and Log-mel spectrograms, the performance of different models is compared as well as analyzed. The study found that there are some common problems existing in the classification performance of the model. The features and algorithms currently used can effectively distinguish emotions with varied “arousal”, but it is difficult to identify the feelings with similar arousal, among the models. The CNN-LSTM model with Log-mel spectrograms as input achieved the highest accuracy.\",\"PeriodicalId\":415094,\"journal\":{\"name\":\"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)\",\"volume\":\"388 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONF-SPML54095.2021.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Signal Processing and Machine Learning (CONF-SPML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONF-SPML54095.2021.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Application of Machine Learning Algorithms in Speech Emotion Recognition
Speech emotion recognition has been widely used in recent years and has become a heated topic for research. Focused on the convolutional neural network model using spectrograms as input, the CNN-LSTM model based on feature vectors, original speech signal and Log-mel spectrograms, the performance of different models is compared as well as analyzed. The study found that there are some common problems existing in the classification performance of the model. The features and algorithms currently used can effectively distinguish emotions with varied “arousal”, but it is difficult to identify the feelings with similar arousal, among the models. The CNN-LSTM model with Log-mel spectrograms as input achieved the highest accuracy.