Stride Based Convolutional Neural Network for Speech Emotion Recognition

T. Wani, T. Gunawan, Syed Asif Ahmad Qadri, H. Mansor, F. Arifin, Y. Ahmad
{"title":"Stride Based Convolutional Neural Network for Speech Emotion Recognition","authors":"T. Wani, T. Gunawan, Syed Asif Ahmad Qadri, H. Mansor, F. Arifin, Y. Ahmad","doi":"10.1109/ICSIMA50015.2021.9526320","DOIUrl":null,"url":null,"abstract":"Speech Emotion Recognition (SER) recognizes the emotional features of speech signals regardless of semantic content. Deep Learning techniques have proven superior to conventional techniques for emotion recognition due to advantages such as speed and scalability and infinitely versatile operation. However, since emotions are subjective, there is no universal agreement on evaluating or categorizing them. The main objective of this paper is to design a suitable model of Convolutional Neural Network (CNN) – Stride-based Convolutional Neural Network (SCNN) by taking a smaller number of convolutional layers and eliminate the pooling-layers to increase computational stability. This elimination tends to increase the accuracy and decrease the computational time of the SER system. Instead of pooling layers, deep strides have been used for the necessary dimension reduction. SCNN is trained on spectrograms generated from the speech signals of two different databases, Berlin (Emo-DB) and IITKGP-SEHSC. Four emotions, angry, happy, neutral, and sad, have been considered for the evaluation process, and a validation accuracy of 90.67% and 91.33% is achieved for Emo-DB and IITKGPSEHSC, respectively. This study provides new benchmarks for both datasets, demonstrating the feasibility and relevance of the presented SER technique.","PeriodicalId":404811,"journal":{"name":"2021 IEEE 7th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 7th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSIMA50015.2021.9526320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Speech Emotion Recognition (SER) recognizes the emotional features of speech signals regardless of semantic content. Deep Learning techniques have proven superior to conventional techniques for emotion recognition due to advantages such as speed and scalability and infinitely versatile operation. However, since emotions are subjective, there is no universal agreement on evaluating or categorizing them. The main objective of this paper is to design a suitable model of Convolutional Neural Network (CNN) – Stride-based Convolutional Neural Network (SCNN) by taking a smaller number of convolutional layers and eliminate the pooling-layers to increase computational stability. This elimination tends to increase the accuracy and decrease the computational time of the SER system. Instead of pooling layers, deep strides have been used for the necessary dimension reduction. SCNN is trained on spectrograms generated from the speech signals of two different databases, Berlin (Emo-DB) and IITKGP-SEHSC. Four emotions, angry, happy, neutral, and sad, have been considered for the evaluation process, and a validation accuracy of 90.67% and 91.33% is achieved for Emo-DB and IITKGPSEHSC, respectively. This study provides new benchmarks for both datasets, demonstrating the feasibility and relevance of the presented SER technique.
基于跨步卷积神经网络的语音情感识别
语音情感识别(SER)可以识别语音信号的情感特征,而不考虑语义内容。深度学习技术已被证明优于传统的情感识别技术,因为它具有速度、可扩展性和无限通用的操作等优势。然而,由于情绪是主观的,对它们的评估或分类没有普遍的共识。本文的主要目标是设计一种合适的卷积神经网络(CNN)模型——基于Stride-based卷积神经网络(SCNN),采用较少的卷积层数并消除池化层以提高计算稳定性。这种消除倾向于提高SER系统的精度和减少计算时间。而不是池化层,深跨步已被用于必要的降维。SCNN是在Berlin (Emo-DB)和IITKGP-SEHSC两个不同数据库的语音信号生成的频谱图上进行训练的。在评价过程中考虑了愤怒、快乐、中性和悲伤四种情绪,Emo-DB和IITKGPSEHSC的验证准确率分别为90.67%和91.33%。这项研究为这两个数据集提供了新的基准,证明了所提出的SER技术的可行性和相关性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信