阿拉伯语英语语音情感识别系统

2023 20th Learning and Technology Conference (L&T) Pub Date : 2023-01-26 DOI:10.1109/LT58159.2023.10092295

Mai El Seknedy, S. Fawzi

{"title":"阿拉伯语英语语音情感识别系统","authors":"Mai El Seknedy, S. Fawzi","doi":"10.1109/LT58159.2023.10092295","DOIUrl":null,"url":null,"abstract":"The Speech Emotion Recognition (SER) system is an approach to identify individuals' emotions. This is important for human-machine interface applications and for the emerging Metaverse. This work presents a bilingual Arabic-English speech emotion recognition system based on EYASE and RAVDESS datasets. A novel feature set was composed by using spectral and prosodic parameters to obtain high performance at a low computational cost. Different classification models were applied. These machine learning classifiers are Random Forest, Support Vector Machine, Logistic Regression, Multi-Layer Perceptron, and Ensemble learning. The proposed feature set performance was compared to the \"Interspeech 2009\" challenge feature set, which is considered a benchmark in the field. Promising results were obtained using the proposed feature sets. SVM resulted in the best emotion recognition rate and execution performance. The best accuracies achieved were 85% on RADVESS, and 64% on EYASE. Ensemble learning detected the valence emotion with 90% on RADVESS, and 87.6% on EYASE.","PeriodicalId":142898,"journal":{"name":"2023 20th Learning and Technology Conference (L&T)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Arabic English Speech Emotion Recognition System\",\"authors\":\"Mai El Seknedy, S. Fawzi\",\"doi\":\"10.1109/LT58159.2023.10092295\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Speech Emotion Recognition (SER) system is an approach to identify individuals' emotions. This is important for human-machine interface applications and for the emerging Metaverse. This work presents a bilingual Arabic-English speech emotion recognition system based on EYASE and RAVDESS datasets. A novel feature set was composed by using spectral and prosodic parameters to obtain high performance at a low computational cost. Different classification models were applied. These machine learning classifiers are Random Forest, Support Vector Machine, Logistic Regression, Multi-Layer Perceptron, and Ensemble learning. The proposed feature set performance was compared to the \\\"Interspeech 2009\\\" challenge feature set, which is considered a benchmark in the field. Promising results were obtained using the proposed feature sets. SVM resulted in the best emotion recognition rate and execution performance. The best accuracies achieved were 85% on RADVESS, and 64% on EYASE. Ensemble learning detected the valence emotion with 90% on RADVESS, and 87.6% on EYASE.\",\"PeriodicalId\":142898,\"journal\":{\"name\":\"2023 20th Learning and Technology Conference (L&T)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 20th Learning and Technology Conference (L&T)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/LT58159.2023.10092295\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 20th Learning and Technology Conference (L&T)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LT58159.2023.10092295","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语音情绪识别(SER)系统是一种识别个体情绪的方法。这对于人机界面应用程序和新兴的Metaverse非常重要。本文提出了一种基于EYASE和RAVDESS数据集的双语阿拉伯-英语语音情感识别系统。利用谱参数和韵律参数组成新的特征集，以较低的计算成本获得较高的性能。采用了不同的分类模型。这些机器学习分类器是随机森林、支持向量机、逻辑回归、多层感知器和集成学习。将提出的特征集性能与“Interspeech 2009”挑战特征集进行了比较，该特征集被认为是该领域的基准。使用所提出的特征集获得了令人满意的结果。支持向量机的情绪识别率和执行性能最好。在RADVESS上达到的最佳准确率为85%，在EYASE上达到64%。集成学习对效价情绪的检测率为RADVESS的90%，EYASE的87.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Arabic English Speech Emotion Recognition System

The Speech Emotion Recognition (SER) system is an approach to identify individuals' emotions. This is important for human-machine interface applications and for the emerging Metaverse. This work presents a bilingual Arabic-English speech emotion recognition system based on EYASE and RAVDESS datasets. A novel feature set was composed by using spectral and prosodic parameters to obtain high performance at a low computational cost. Different classification models were applied. These machine learning classifiers are Random Forest, Support Vector Machine, Logistic Regression, Multi-Layer Perceptron, and Ensemble learning. The proposed feature set performance was compared to the "Interspeech 2009" challenge feature set, which is considered a benchmark in the field. Promising results were obtained using the proposed feature sets. SVM resulted in the best emotion recognition rate and execution performance. The best accuracies achieved were 85% on RADVESS, and 64% on EYASE. Ensemble learning detected the valence emotion with 90% on RADVESS, and 87.6% on EYASE.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 20th Learning and Technology Conference (L&T)

自引率

0.00%

发文量