{"title":"Speech Emotion Recognition based on Interactive Convolutional Neural Network","authors":"Huihui Cheng, Xiaoyu Tang","doi":"10.1109/ICICSP50920.2020.9232071","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition (SER) plays an indispensable role in intelligent speech application. The MFCC that rich in frequency characteristic, is widely used as an input in the task of SER. However, the performance of previous work has been restricted by neglecting the interaction of different frequencies in MFCC, since the converged communication of frequency is also critical for us to generate discriminative emotion feature representations. Therefore, in this paper, we propose an interactive convolutional neural network (ICNN), where the input feature map will be factorized into different frequency scales for interactive convolution. Massive experiments have been conducted to evaluate the effects of introduced ICNN, and the results show that with the help of interactive convolution, we can reduce the redundant information of feature map effectively, and improve the accuracy of SER tasks.","PeriodicalId":117760,"journal":{"name":"2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICSP50920.2020.9232071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Speech emotion recognition (SER) plays an indispensable role in intelligent speech application. The MFCC that rich in frequency characteristic, is widely used as an input in the task of SER. However, the performance of previous work has been restricted by neglecting the interaction of different frequencies in MFCC, since the converged communication of frequency is also critical for us to generate discriminative emotion feature representations. Therefore, in this paper, we propose an interactive convolutional neural network (ICNN), where the input feature map will be factorized into different frequency scales for interactive convolution. Massive experiments have been conducted to evaluate the effects of introduced ICNN, and the results show that with the help of interactive convolution, we can reduce the redundant information of feature map effectively, and improve the accuracy of SER tasks.