Arabic English Speech Emotion Recognition System

2023 20th Learning and Technology Conference (L&T) Pub Date : 2023-01-26 DOI:10.1109/LT58159.2023.10092295

Mai El Seknedy, S. Fawzi

引用次数: 0

Abstract

The Speech Emotion Recognition (SER) system is an approach to identify individuals' emotions. This is important for human-machine interface applications and for the emerging Metaverse. This work presents a bilingual Arabic-English speech emotion recognition system based on EYASE and RAVDESS datasets. A novel feature set was composed by using spectral and prosodic parameters to obtain high performance at a low computational cost. Different classification models were applied. These machine learning classifiers are Random Forest, Support Vector Machine, Logistic Regression, Multi-Layer Perceptron, and Ensemble learning. The proposed feature set performance was compared to the "Interspeech 2009" challenge feature set, which is considered a benchmark in the field. Promising results were obtained using the proposed feature sets. SVM resulted in the best emotion recognition rate and execution performance. The best accuracies achieved were 85% on RADVESS, and 64% on EYASE. Ensemble learning detected the valence emotion with 90% on RADVESS, and 87.6% on EYASE.

查看原文本刊更多论文

阿拉伯语英语语音情感识别系统

语音情绪识别(SER)系统是一种识别个体情绪的方法。这对于人机界面应用程序和新兴的Metaverse非常重要。本文提出了一种基于EYASE和RAVDESS数据集的双语阿拉伯-英语语音情感识别系统。利用谱参数和韵律参数组成新的特征集，以较低的计算成本获得较高的性能。采用了不同的分类模型。这些机器学习分类器是随机森林、支持向量机、逻辑回归、多层感知器和集成学习。将提出的特征集性能与“Interspeech 2009”挑战特征集进行了比较，该特征集被认为是该领域的基准。使用所提出的特征集获得了令人满意的结果。支持向量机的情绪识别率和执行性能最好。在RADVESS上达到的最佳准确率为85%，在EYASE上达到64%。集成学习对效价情绪的检测率为RADVESS的90%，EYASE的87.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 20th Learning and Technology Conference (L&T)

自引率

0.00%

发文量