Gender Differentiated Convolutional Neural Networks for Speech Emotion Recognition

P. Mishra, Ruchir Sharma
{"title":"Gender Differentiated Convolutional Neural Networks for Speech Emotion Recognition","authors":"P. Mishra, Ruchir Sharma","doi":"10.1109/ICUMT51630.2020.9222412","DOIUrl":null,"url":null,"abstract":"This paper proposes a two-stage gender-differentiated system for Speech Emotion Recognition using Mel-frequency Cepstral Coefficients and Convolutional Neural Networks. Acoustical variances between male and female speakers pose a problem and it is established that gender-dependent emotion recognizers perform better than gender-independent ones. The provided solution can recognize seven emotions (anger, disgust, fear, happiness, sadness, surprise, and neutral state). Data augmentation is used to compensate for the lack of quality data, with the raw speech samples derived from four datasets, namely: RAVDESS, CREMA-D, SAVEE, and TESS. The system is composed of two stages: 1) gender classification and; 2) emotion classification. The output of the gender classifier in the first stage determines the gender-specific classifier for the second stage. The experimental evaluation displays the performance in terms of the correct emotion recognition rate of the proposed SER model. The results demonstrate that a gender-differentiated system significantly improves performance. The obtained results also show that using Global Average Pooling instead of a fully-connected network at the end of the CNN classifier further improves the performance. Future implementations of this proposed system may allow effective human-computer intelligent interaction.","PeriodicalId":170847,"journal":{"name":"2020 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICUMT51630.2020.9222412","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This paper proposes a two-stage gender-differentiated system for Speech Emotion Recognition using Mel-frequency Cepstral Coefficients and Convolutional Neural Networks. Acoustical variances between male and female speakers pose a problem and it is established that gender-dependent emotion recognizers perform better than gender-independent ones. The provided solution can recognize seven emotions (anger, disgust, fear, happiness, sadness, surprise, and neutral state). Data augmentation is used to compensate for the lack of quality data, with the raw speech samples derived from four datasets, namely: RAVDESS, CREMA-D, SAVEE, and TESS. The system is composed of two stages: 1) gender classification and; 2) emotion classification. The output of the gender classifier in the first stage determines the gender-specific classifier for the second stage. The experimental evaluation displays the performance in terms of the correct emotion recognition rate of the proposed SER model. The results demonstrate that a gender-differentiated system significantly improves performance. The obtained results also show that using Global Average Pooling instead of a fully-connected network at the end of the CNN classifier further improves the performance. Future implementations of this proposed system may allow effective human-computer intelligent interaction.
基于性别分化的卷积神经网络语音情感识别
本文提出了一种基于mel频率倒谱系数和卷积神经网络的两阶段性别区分语音情感识别系统。男性和女性说话者之间的声音差异构成了一个问题,性别依赖的情绪识别器比性别独立的情绪识别器表现得更好。提供的解决方案可以识别七种情绪(愤怒,厌恶,恐惧,快乐,悲伤,惊讶和中性状态)。数据增强用于弥补质量数据的不足,原始语音样本来自四个数据集,即:RAVDESS, CREMA-D, SAVEE和TESS。该系统由两个阶段组成:1)性别分类和;2)情绪分类。第一阶段性别分类器的输出决定了第二阶段的性别分类器。实验评价显示了所提出的SER模型在情绪识别率方面的性能。结果表明,性别区分系统显著提高了绩效。得到的结果还表明,在CNN分类器的末端使用Global Average Pooling而不是全连接网络进一步提高了性能。该系统的未来实现可能允许有效的人机智能交互。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信