基于卷积神经网络的歌唱情感识别

2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP) Pub Date : 2021-04-09 DOI:10.1109/ICSP51882.2021.9408959

Yingchao Shi, Xiao Zhou

{"title":"基于卷积神经网络的歌唱情感识别","authors":"Yingchao Shi, Xiao Zhou","doi":"10.1109/ICSP51882.2021.9408959","DOIUrl":null,"url":null,"abstract":"With the development of deep learning, convolution neural network (CNN) has been widely applied in the field of emotion recognition. The vital to enhance the performance of singing emotion recognition system is to select a suitable feature and establish reliable models. The feature of Mel Frequency Cepstral Coefficient (MFCC) method has been proved to be effective in recognizing emotions. Therefore, in this paper, CNN is used to build a model of singing emotion recognition system, and MFCC method is used in feature extraction. For improving the accuracy of this system, the feature matrices have been segmented into small slices, and the method of majority vote has been used in the test part to identify the emotion. To verify the generalization of this system, this paper provides two approaches in model building part. One approach distinguishes male and female speakers separately. The other one is to build a mixed model. The accuracy of the singing emotion recognition system has been improved in both approaches and is not influenced by using separate model or mixed model.","PeriodicalId":117159,"journal":{"name":"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Emotion Recognition in Singing using Convolutional Neural Networks\",\"authors\":\"Yingchao Shi, Xiao Zhou\",\"doi\":\"10.1109/ICSP51882.2021.9408959\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of deep learning, convolution neural network (CNN) has been widely applied in the field of emotion recognition. The vital to enhance the performance of singing emotion recognition system is to select a suitable feature and establish reliable models. The feature of Mel Frequency Cepstral Coefficient (MFCC) method has been proved to be effective in recognizing emotions. Therefore, in this paper, CNN is used to build a model of singing emotion recognition system, and MFCC method is used in feature extraction. For improving the accuracy of this system, the feature matrices have been segmented into small slices, and the method of majority vote has been used in the test part to identify the emotion. To verify the generalization of this system, this paper provides two approaches in model building part. One approach distinguishes male and female speakers separately. The other one is to build a mixed model. The accuracy of the singing emotion recognition system has been improved in both approaches and is not influenced by using separate model or mixed model.\",\"PeriodicalId\":117159,\"journal\":{\"name\":\"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSP51882.2021.9408959\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSP51882.2021.9408959","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

随着深度学习的发展，卷积神经网络(CNN)在情绪识别领域得到了广泛的应用。提高歌唱情感识别系统性能的关键是选择合适的特征，建立可靠的模型。Mel频率倒谱系数(MFCC)特征方法已被证明是一种有效的情绪识别方法。因此，本文采用CNN构建歌唱情感识别系统模型，并采用MFCC方法进行特征提取。为了提高系统的准确率，将特征矩阵分割成小块，并在测试部分采用多数投票的方法进行情感识别。为了验证该系统的泛化性，本文在模型构建部分提供了两种方法。一种方法是分别区分男性和女性演讲者。另一个是建立一个混合模型。两种方法均提高了歌唱情感识别系统的准确率，且不受单独模型和混合模型的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Emotion Recognition in Singing using Convolutional Neural Networks

With the development of deep learning, convolution neural network (CNN) has been widely applied in the field of emotion recognition. The vital to enhance the performance of singing emotion recognition system is to select a suitable feature and establish reliable models. The feature of Mel Frequency Cepstral Coefficient (MFCC) method has been proved to be effective in recognizing emotions. Therefore, in this paper, CNN is used to build a model of singing emotion recognition system, and MFCC method is used in feature extraction. For improving the accuracy of this system, the feature matrices have been segmented into small slices, and the method of majority vote has been used in the test part to identify the emotion. To verify the generalization of this system, this paper provides two approaches in model building part. One approach distinguishes male and female speakers separately. The other one is to build a mixed model. The accuracy of the singing emotion recognition system has been improved in both approaches and is not influenced by using separate model or mixed model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)

自引率

0.00%

发文量