An Acoustic Analysis of Speech for Emotion Recognition using Deep Learning

2022 1st International Conference on the Paradigm Shifts in Communication, Embedded Systems, Machine Learning and Signal Processing (PCEMS) Pub Date : 2022-05-06 DOI:10.1109/PCEMS55161.2022.9808012

Aman Verma, Raghav Agrawal, Priyanka Singh, N. Ansari

{"title":"An Acoustic Analysis of Speech for Emotion Recognition using Deep Learning","authors":"Aman Verma, Raghav Agrawal, Priyanka Singh, N. Ansari","doi":"10.1109/PCEMS55161.2022.9808012","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition has shown several advancements as a result of advancements in Deep Learning algorithms. These algorithms can easily extract the features from the data and learn to recognize patterns from them. Although these algorithms can successfully recognize emotions, their efficiency is often argued. The main objective of this paper is to efficiently classify the emotional state of a person from speech signals using traditional machine learning and deep learning techniques and to present a comparative analysis. We have considered eight different types of emotions, and have analyzed them in the following two ways: First, by considering the male and female emotions combinedly (gender-neutral) where they are classified into eight classes, and second, separately for the male and female emotions (gender-based) for a total of 16 classes. We have performed experimentation and have tested several architectures like K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), One Dimensional Convolutional Neural Network + Long Short-Term Memory (ID CNN+LSTM) by efficiently tuning the hyperparameters to classify the emotional states. Best results are obtained with the ID CNN + LSTM model. We have obtained an accuracy of 87.4% for gender-neutral cases and 82.78% for gender-based cases. This model outperforms existing techniques.","PeriodicalId":248874,"journal":{"name":"2022 1st International Conference on the Paradigm Shifts in Communication, Embedded Systems, Machine Learning and Signal Processing (PCEMS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 1st International Conference on the Paradigm Shifts in Communication, Embedded Systems, Machine Learning and Signal Processing (PCEMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCEMS55161.2022.9808012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Speech emotion recognition has shown several advancements as a result of advancements in Deep Learning algorithms. These algorithms can easily extract the features from the data and learn to recognize patterns from them. Although these algorithms can successfully recognize emotions, their efficiency is often argued. The main objective of this paper is to efficiently classify the emotional state of a person from speech signals using traditional machine learning and deep learning techniques and to present a comparative analysis. We have considered eight different types of emotions, and have analyzed them in the following two ways: First, by considering the male and female emotions combinedly (gender-neutral) where they are classified into eight classes, and second, separately for the male and female emotions (gender-based) for a total of 16 classes. We have performed experimentation and have tested several architectures like K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), One Dimensional Convolutional Neural Network + Long Short-Term Memory (ID CNN+LSTM) by efficiently tuning the hyperparameters to classify the emotional states. Best results are obtained with the ID CNN + LSTM model. We have obtained an accuracy of 87.4% for gender-neutral cases and 82.78% for gender-based cases. This model outperforms existing techniques.

查看原文本刊更多论文

基于深度学习的情感识别语音声学分析

由于深度学习算法的进步，语音情感识别已经取得了一些进展。这些算法可以很容易地从数据中提取特征并从中学习识别模式。虽然这些算法可以成功地识别情绪，但它们的效率经常受到争议。本文的主要目的是利用传统的机器学习和深度学习技术有效地从语音信号中分类人的情绪状态，并进行比较分析。我们考虑了8种不同类型的情绪，并通过以下两种方式进行了分析:首先，将男性和女性情绪合并考虑(性别中立)，将其分为8类;其次，将男性和女性情绪分开考虑(性别为基础)，共分为16类。我们已经进行了实验，并通过有效地调整超参数来对情绪状态进行分类，测试了k -最近邻(KNN)，多层感知器(MLP)，一维卷积神经网络+长短期记忆(ID CNN+LSTM)等几种架构。ID CNN + LSTM模型效果最好。我们在性别中立病例和基于性别的病例中获得了87.4%和82.78%的准确率。这个模型优于现有的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 1st International Conference on the Paradigm Shifts in Communication, Embedded Systems, Machine Learning and Signal Processing (PCEMS)

自引率

0.00%

发文量