An Acoustic Analysis of Speech for Emotion Recognition using Deep Learning

Aman Verma, Raghav Agrawal, Priyanka Singh, N. Ansari
{"title":"An Acoustic Analysis of Speech for Emotion Recognition using Deep Learning","authors":"Aman Verma, Raghav Agrawal, Priyanka Singh, N. Ansari","doi":"10.1109/PCEMS55161.2022.9808012","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition has shown several advancements as a result of advancements in Deep Learning algorithms. These algorithms can easily extract the features from the data and learn to recognize patterns from them. Although these algorithms can successfully recognize emotions, their efficiency is often argued. The main objective of this paper is to efficiently classify the emotional state of a person from speech signals using traditional machine learning and deep learning techniques and to present a comparative analysis. We have considered eight different types of emotions, and have analyzed them in the following two ways: First, by considering the male and female emotions combinedly (gender-neutral) where they are classified into eight classes, and second, separately for the male and female emotions (gender-based) for a total of 16 classes. We have performed experimentation and have tested several architectures like K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), One Dimensional Convolutional Neural Network + Long Short-Term Memory (ID CNN+LSTM) by efficiently tuning the hyperparameters to classify the emotional states. Best results are obtained with the ID CNN + LSTM model. We have obtained an accuracy of 87.4% for gender-neutral cases and 82.78% for gender-based cases. This model outperforms existing techniques.","PeriodicalId":248874,"journal":{"name":"2022 1st International Conference on the Paradigm Shifts in Communication, Embedded Systems, Machine Learning and Signal Processing (PCEMS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 1st International Conference on the Paradigm Shifts in Communication, Embedded Systems, Machine Learning and Signal Processing (PCEMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCEMS55161.2022.9808012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Speech emotion recognition has shown several advancements as a result of advancements in Deep Learning algorithms. These algorithms can easily extract the features from the data and learn to recognize patterns from them. Although these algorithms can successfully recognize emotions, their efficiency is often argued. The main objective of this paper is to efficiently classify the emotional state of a person from speech signals using traditional machine learning and deep learning techniques and to present a comparative analysis. We have considered eight different types of emotions, and have analyzed them in the following two ways: First, by considering the male and female emotions combinedly (gender-neutral) where they are classified into eight classes, and second, separately for the male and female emotions (gender-based) for a total of 16 classes. We have performed experimentation and have tested several architectures like K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP), One Dimensional Convolutional Neural Network + Long Short-Term Memory (ID CNN+LSTM) by efficiently tuning the hyperparameters to classify the emotional states. Best results are obtained with the ID CNN + LSTM model. We have obtained an accuracy of 87.4% for gender-neutral cases and 82.78% for gender-based cases. This model outperforms existing techniques.
基于深度学习的情感识别语音声学分析
由于深度学习算法的进步,语音情感识别已经取得了一些进展。这些算法可以很容易地从数据中提取特征并从中学习识别模式。虽然这些算法可以成功地识别情绪,但它们的效率经常受到争议。本文的主要目的是利用传统的机器学习和深度学习技术有效地从语音信号中分类人的情绪状态,并进行比较分析。我们考虑了8种不同类型的情绪,并通过以下两种方式进行了分析:首先,将男性和女性情绪合并考虑(性别中立),将其分为8类;其次,将男性和女性情绪分开考虑(性别为基础),共分为16类。我们已经进行了实验,并通过有效地调整超参数来对情绪状态进行分类,测试了k -最近邻(KNN),多层感知器(MLP),一维卷积神经网络+长短期记忆(ID CNN+LSTM)等几种架构。ID CNN + LSTM模型效果最好。我们在性别中立病例和基于性别的病例中获得了87.4%和82.78%的准确率。这个模型优于现有的技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信