{"title":"基于深度波场外推和改进波物理模型的语音情感识别方法","authors":"Chunjun Zheng, Chunli Wang, Ning Jia","doi":"10.1109/ECIT52743.2021.00081","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition(SER) task based on acoustic wave always has a large number of features, which brings great difficulties to improve the accuracy of recognition. In this paper, we propose a new speech emotion recognition method, which is based on depth wavefield extrapolation and improved wave physics model (DWE-WPM). The method can improve loss accuracy and feature explosion problem when extracting the features. The schema comes from the wave physics system. After extrapolating the wavefield with a fixed-step depth, we inject the reconstructed waveform into DWE-WPM to simulate the information mining process of Long Short-Term Memory Recurrent Neural Network(LSTM), and then fuse the output features of this model with the sorted HSF features. Finally, the integrated features are injected into BiMLSTM to automatically complete the SER task. Massive experiments were carried out on the emotion corpus of interactive emotional dyadic motion capture (IEMOCAP). The experimental results showed that the weighted average (UA) accuracy of the proposed method can be improved by 21%, which was better than the existing methods of SER from raw wave. The method proposed in the paper proved the effective for SER task.","PeriodicalId":186487,"journal":{"name":"2021 2nd International Conference on E-Commerce and Internet Technology (ECIT)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech Emotion Recognition Method Using Depth Wavefield Extrapolation and Improved Wave Physics Model\",\"authors\":\"Chunjun Zheng, Chunli Wang, Ning Jia\",\"doi\":\"10.1109/ECIT52743.2021.00081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech emotion recognition(SER) task based on acoustic wave always has a large number of features, which brings great difficulties to improve the accuracy of recognition. In this paper, we propose a new speech emotion recognition method, which is based on depth wavefield extrapolation and improved wave physics model (DWE-WPM). The method can improve loss accuracy and feature explosion problem when extracting the features. The schema comes from the wave physics system. After extrapolating the wavefield with a fixed-step depth, we inject the reconstructed waveform into DWE-WPM to simulate the information mining process of Long Short-Term Memory Recurrent Neural Network(LSTM), and then fuse the output features of this model with the sorted HSF features. Finally, the integrated features are injected into BiMLSTM to automatically complete the SER task. Massive experiments were carried out on the emotion corpus of interactive emotional dyadic motion capture (IEMOCAP). The experimental results showed that the weighted average (UA) accuracy of the proposed method can be improved by 21%, which was better than the existing methods of SER from raw wave. The method proposed in the paper proved the effective for SER task.\",\"PeriodicalId\":186487,\"journal\":{\"name\":\"2021 2nd International Conference on E-Commerce and Internet Technology (ECIT)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 2nd International Conference on E-Commerce and Internet Technology (ECIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECIT52743.2021.00081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Conference on E-Commerce and Internet Technology (ECIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECIT52743.2021.00081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech Emotion Recognition Method Using Depth Wavefield Extrapolation and Improved Wave Physics Model
Speech emotion recognition(SER) task based on acoustic wave always has a large number of features, which brings great difficulties to improve the accuracy of recognition. In this paper, we propose a new speech emotion recognition method, which is based on depth wavefield extrapolation and improved wave physics model (DWE-WPM). The method can improve loss accuracy and feature explosion problem when extracting the features. The schema comes from the wave physics system. After extrapolating the wavefield with a fixed-step depth, we inject the reconstructed waveform into DWE-WPM to simulate the information mining process of Long Short-Term Memory Recurrent Neural Network(LSTM), and then fuse the output features of this model with the sorted HSF features. Finally, the integrated features are injected into BiMLSTM to automatically complete the SER task. Massive experiments were carried out on the emotion corpus of interactive emotional dyadic motion capture (IEMOCAP). The experimental results showed that the weighted average (UA) accuracy of the proposed method can be improved by 21%, which was better than the existing methods of SER from raw wave. The method proposed in the paper proved the effective for SER task.