{"title":"Speech Emotion Recognition Method Using Depth Wavefield Extrapolation and Improved Wave Physics Model","authors":"Chunjun Zheng, Chunli Wang, Ning Jia","doi":"10.1109/ECIT52743.2021.00081","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition(SER) task based on acoustic wave always has a large number of features, which brings great difficulties to improve the accuracy of recognition. In this paper, we propose a new speech emotion recognition method, which is based on depth wavefield extrapolation and improved wave physics model (DWE-WPM). The method can improve loss accuracy and feature explosion problem when extracting the features. The schema comes from the wave physics system. After extrapolating the wavefield with a fixed-step depth, we inject the reconstructed waveform into DWE-WPM to simulate the information mining process of Long Short-Term Memory Recurrent Neural Network(LSTM), and then fuse the output features of this model with the sorted HSF features. Finally, the integrated features are injected into BiMLSTM to automatically complete the SER task. Massive experiments were carried out on the emotion corpus of interactive emotional dyadic motion capture (IEMOCAP). The experimental results showed that the weighted average (UA) accuracy of the proposed method can be improved by 21%, which was better than the existing methods of SER from raw wave. The method proposed in the paper proved the effective for SER task.","PeriodicalId":186487,"journal":{"name":"2021 2nd International Conference on E-Commerce and Internet Technology (ECIT)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Conference on E-Commerce and Internet Technology (ECIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECIT52743.2021.00081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Speech emotion recognition(SER) task based on acoustic wave always has a large number of features, which brings great difficulties to improve the accuracy of recognition. In this paper, we propose a new speech emotion recognition method, which is based on depth wavefield extrapolation and improved wave physics model (DWE-WPM). The method can improve loss accuracy and feature explosion problem when extracting the features. The schema comes from the wave physics system. After extrapolating the wavefield with a fixed-step depth, we inject the reconstructed waveform into DWE-WPM to simulate the information mining process of Long Short-Term Memory Recurrent Neural Network(LSTM), and then fuse the output features of this model with the sorted HSF features. Finally, the integrated features are injected into BiMLSTM to automatically complete the SER task. Massive experiments were carried out on the emotion corpus of interactive emotional dyadic motion capture (IEMOCAP). The experimental results showed that the weighted average (UA) accuracy of the proposed method can be improved by 21%, which was better than the existing methods of SER from raw wave. The method proposed in the paper proved the effective for SER task.