Aman Verma, Raghav Agrawal, Priyanka Singh, N. Ansari
{"title":"An Acoustic Analysis of Speech for Emotion Recognition using Deep Learning","authors":"Aman Verma, Raghav Agrawal, Priyanka Singh, N. Ansari","doi":"10.1109/PCEMS55161.2022.9808012","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition has shown several advancements as a result of advancements in Deep Learning algorithms. These algorithms can easily extract the features from the data and learn to recognize patterns from them. Although these algorithms can successfully recognize emotions, their efficiency is often argued. The main objective of this paper is to efficiently classify the emotional state of a person from speech signals using traditional machine learning and deep learning techniques and to present a comparative analysis. We have considered eight different types of emotions, and have analyzed them in the following two ways: First, by considering the male and female emotions combinedly (gender-neutral) where they are classified into eight classes, and second, separately for the male and female emotions (gender-based) for a total of 16 classes. We have performed experimentation and have tested several architectures like K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), One Dimensional Convolutional Neural Network + Long Short-Term Memory (ID CNN+LSTM) by efficiently tuning the hyperparameters to classify the emotional states. Best results are obtained with the ID CNN + LSTM model. We have obtained an accuracy of 87.4% for gender-neutral cases and 82.78% for gender-based cases. This model outperforms existing techniques.","PeriodicalId":248874,"journal":{"name":"2022 1st International Conference on the Paradigm Shifts in Communication, Embedded Systems, Machine Learning and Signal Processing (PCEMS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 1st International Conference on the Paradigm Shifts in Communication, Embedded Systems, Machine Learning and Signal Processing (PCEMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCEMS55161.2022.9808012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Speech emotion recognition has shown several advancements as a result of advancements in Deep Learning algorithms. These algorithms can easily extract the features from the data and learn to recognize patterns from them. Although these algorithms can successfully recognize emotions, their efficiency is often argued. The main objective of this paper is to efficiently classify the emotional state of a person from speech signals using traditional machine learning and deep learning techniques and to present a comparative analysis. We have considered eight different types of emotions, and have analyzed them in the following two ways: First, by considering the male and female emotions combinedly (gender-neutral) where they are classified into eight classes, and second, separately for the male and female emotions (gender-based) for a total of 16 classes. We have performed experimentation and have tested several architectures like K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), One Dimensional Convolutional Neural Network + Long Short-Term Memory (ID CNN+LSTM) by efficiently tuning the hyperparameters to classify the emotional states. Best results are obtained with the ID CNN + LSTM model. We have obtained an accuracy of 87.4% for gender-neutral cases and 82.78% for gender-based cases. This model outperforms existing techniques.