{"title":"Speech emotion recognition based on convolutional neural network","authors":"Chen Jie","doi":"10.1109/NetCIT54147.2021.00028","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition is a technology to automatically obtain emotion types from given attributive segments. With the increasing demand for emotion recognition in business, education and other fields, the development of high-accuracy speech emotion recognition system has become a hot research direction in the speech field. Speech emotion recognition takes speech as the carrier of emotion to study the formation and change of various emotions in speech, so that the computer can analyze the speaker's specific emotional situation through speech, so as to make human-computer interaction more humanized. In order to improve the accuracy of intelligent speech emotion recognition system, a speech emotion recognition model based on feature representation of convolutional neural network CNN( Convolution Neural Network) is proposed. Mel-frequency cepstral coefficients (MFCC), which is the most widely used method to extract speech features, is selected for the experiment. At the same time, in order to increase the feature differences between emotional speech, the mel-frequency cepstral coefficients feature data matrix obtained from speech signal preprocessing is transformed to improve the speech emotion recognition rate.","PeriodicalId":378372,"journal":{"name":"2021 International Conference on Networking, Communications and Information Technology (NetCIT)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Networking, Communications and Information Technology (NetCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NetCIT54147.2021.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Speech emotion recognition is a technology to automatically obtain emotion types from given attributive segments. With the increasing demand for emotion recognition in business, education and other fields, the development of high-accuracy speech emotion recognition system has become a hot research direction in the speech field. Speech emotion recognition takes speech as the carrier of emotion to study the formation and change of various emotions in speech, so that the computer can analyze the speaker's specific emotional situation through speech, so as to make human-computer interaction more humanized. In order to improve the accuracy of intelligent speech emotion recognition system, a speech emotion recognition model based on feature representation of convolutional neural network CNN( Convolution Neural Network) is proposed. Mel-frequency cepstral coefficients (MFCC), which is the most widely used method to extract speech features, is selected for the experiment. At the same time, in order to increase the feature differences between emotional speech, the mel-frequency cepstral coefficients feature data matrix obtained from speech signal preprocessing is transformed to improve the speech emotion recognition rate.