基于卷积神经网络的歌唱情感识别

Yingchao Shi, Xiao Zhou
{"title":"基于卷积神经网络的歌唱情感识别","authors":"Yingchao Shi, Xiao Zhou","doi":"10.1109/ICSP51882.2021.9408959","DOIUrl":null,"url":null,"abstract":"With the development of deep learning, convolution neural network (CNN) has been widely applied in the field of emotion recognition. The vital to enhance the performance of singing emotion recognition system is to select a suitable feature and establish reliable models. The feature of Mel Frequency Cepstral Coefficient (MFCC) method has been proved to be effective in recognizing emotions. Therefore, in this paper, CNN is used to build a model of singing emotion recognition system, and MFCC method is used in feature extraction. For improving the accuracy of this system, the feature matrices have been segmented into small slices, and the method of majority vote has been used in the test part to identify the emotion. To verify the generalization of this system, this paper provides two approaches in model building part. One approach distinguishes male and female speakers separately. The other one is to build a mixed model. The accuracy of the singing emotion recognition system has been improved in both approaches and is not influenced by using separate model or mixed model.","PeriodicalId":117159,"journal":{"name":"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Emotion Recognition in Singing using Convolutional Neural Networks\",\"authors\":\"Yingchao Shi, Xiao Zhou\",\"doi\":\"10.1109/ICSP51882.2021.9408959\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of deep learning, convolution neural network (CNN) has been widely applied in the field of emotion recognition. The vital to enhance the performance of singing emotion recognition system is to select a suitable feature and establish reliable models. The feature of Mel Frequency Cepstral Coefficient (MFCC) method has been proved to be effective in recognizing emotions. Therefore, in this paper, CNN is used to build a model of singing emotion recognition system, and MFCC method is used in feature extraction. For improving the accuracy of this system, the feature matrices have been segmented into small slices, and the method of majority vote has been used in the test part to identify the emotion. To verify the generalization of this system, this paper provides two approaches in model building part. One approach distinguishes male and female speakers separately. The other one is to build a mixed model. The accuracy of the singing emotion recognition system has been improved in both approaches and is not influenced by using separate model or mixed model.\",\"PeriodicalId\":117159,\"journal\":{\"name\":\"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSP51882.2021.9408959\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSP51882.2021.9408959","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

随着深度学习的发展,卷积神经网络(CNN)在情绪识别领域得到了广泛的应用。提高歌唱情感识别系统性能的关键是选择合适的特征,建立可靠的模型。Mel频率倒谱系数(MFCC)特征方法已被证明是一种有效的情绪识别方法。因此,本文采用CNN构建歌唱情感识别系统模型,并采用MFCC方法进行特征提取。为了提高系统的准确率,将特征矩阵分割成小块,并在测试部分采用多数投票的方法进行情感识别。为了验证该系统的泛化性,本文在模型构建部分提供了两种方法。一种方法是分别区分男性和女性演讲者。另一个是建立一个混合模型。两种方法均提高了歌唱情感识别系统的准确率,且不受单独模型和混合模型的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Emotion Recognition in Singing using Convolutional Neural Networks
With the development of deep learning, convolution neural network (CNN) has been widely applied in the field of emotion recognition. The vital to enhance the performance of singing emotion recognition system is to select a suitable feature and establish reliable models. The feature of Mel Frequency Cepstral Coefficient (MFCC) method has been proved to be effective in recognizing emotions. Therefore, in this paper, CNN is used to build a model of singing emotion recognition system, and MFCC method is used in feature extraction. For improving the accuracy of this system, the feature matrices have been segmented into small slices, and the method of majority vote has been used in the test part to identify the emotion. To verify the generalization of this system, this paper provides two approaches in model building part. One approach distinguishes male and female speakers separately. The other one is to build a mixed model. The accuracy of the singing emotion recognition system has been improved in both approaches and is not influenced by using separate model or mixed model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信