{"title":"Deep Learning Techniques for Speech Emotion Recognition: A Review","authors":"S. Pandey, H. S. Shekhawat, S. Prasanna","doi":"10.1109/RADIOELEK.2019.8733432","DOIUrl":null,"url":null,"abstract":"This paper presents an introduction to various deep learning techniques with the aim of capturing and classifying emotional state from speech utterances. Architectures such as Convolutional Neural Network(CNN) and Long Short-Term Memory(LSTM) have been used to test the emotion capturing capability from various standard speech represenations such as mel spectrogram, magnitude spectrogram and Mel-Frequency Cepstral Coefficients (MFCC’s) on two popular datasets- EMO-DB and IEMOCAP. Experimental findings along with reasoning have been presented as to which architecture and feature combination is better suited for the purpose of speech emotion recognition. This work explores the widely used basic deep learning architectures used in literature.","PeriodicalId":336454,"journal":{"name":"2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RADIOELEK.2019.8733432","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 53
Abstract
This paper presents an introduction to various deep learning techniques with the aim of capturing and classifying emotional state from speech utterances. Architectures such as Convolutional Neural Network(CNN) and Long Short-Term Memory(LSTM) have been used to test the emotion capturing capability from various standard speech represenations such as mel spectrogram, magnitude spectrogram and Mel-Frequency Cepstral Coefficients (MFCC’s) on two popular datasets- EMO-DB and IEMOCAP. Experimental findings along with reasoning have been presented as to which architecture and feature combination is better suited for the purpose of speech emotion recognition. This work explores the widely used basic deep learning architectures used in literature.