Multilingual speech emotion recognition using IGRFXG – Ensemble feature selection approach

IF 3.4 2区物理与天体物理 Q1 ACOUSTICS

Applied Acoustics Pub Date : 2025-06-24 DOI:10.1016/j.apacoust.2025.110905

Astha Tripathi, Poonam Rani

{"title":"Multilingual speech emotion recognition using IGRFXG – Ensemble feature selection approach","authors":"Astha Tripathi, Poonam Rani","doi":"10.1016/j.apacoust.2025.110905","DOIUrl":null,"url":null,"abstract":"<div><div>In the field of Human–Computer Interaction, it is essential to recognize emotions through speech signals. Selecting the important features from speech signals is crucial to increase the accuracy of machine learning classifiers. Including unnecessary features diminishes model accuracy and increase system complexity. In this study, we propose a novel approach to tackle the feature selection challenge in speech emotion recognition. Our method employs a IGRFXG – ensemble feature selection approach, the name IGRFXG is derived from the three base feature selection techniques it combines: Information gain (IG), Random Forest (RF), and XGBoost (XG), which operates in two stages. In the first stage, to reduce the presence of unnecessary features, we propose an heterogeneous ensemble feature selection technique that integrates three distinct feature selection methods: Information gain, Random Forest and XG Boost. During this stage, we prioritize the features based on their importance score. In the second stage, we intersect the top features selected by various feature selectors and generate a feature kernel. This feature kernel is subsequently passed to machine learning classifiers. The proposed approach is evaluated on four publicly accessible datasets, namely RAVDESS, SUBESCO, EMOVO, and EMODB using six different machine learning classifiers. The performance of the ML models is assessed based on metrics such as accuracy, recall, precision, and F1 score. The performance assessments indicate that the maximum accuracies achieved are 79.28 %, 92.80 %, 95.00 %, and 90.96 % for the RAVDESS, SUBSECO, EMOVO, and EMODB datasets, respectively.</div></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":"240 ","pages":"Article 110905"},"PeriodicalIF":3.4000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X25003779","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

In the field of Human–Computer Interaction, it is essential to recognize emotions through speech signals. Selecting the important features from speech signals is crucial to increase the accuracy of machine learning classifiers. Including unnecessary features diminishes model accuracy and increase system complexity. In this study, we propose a novel approach to tackle the feature selection challenge in speech emotion recognition. Our method employs a IGRFXG – ensemble feature selection approach, the name IGRFXG is derived from the three base feature selection techniques it combines: Information gain (IG), Random Forest (RF), and XGBoost (XG), which operates in two stages. In the first stage, to reduce the presence of unnecessary features, we propose an heterogeneous ensemble feature selection technique that integrates three distinct feature selection methods: Information gain, Random Forest and XG Boost. During this stage, we prioritize the features based on their importance score. In the second stage, we intersect the top features selected by various feature selectors and generate a feature kernel. This feature kernel is subsequently passed to machine learning classifiers. The proposed approach is evaluated on four publicly accessible datasets, namely RAVDESS, SUBESCO, EMOVO, and EMODB using six different machine learning classifiers. The performance of the ML models is assessed based on metrics such as accuracy, recall, precision, and F1 score. The performance assessments indicate that the maximum accuracies achieved are 79.28 %, 92.80 %, 95.00 %, and 90.96 % for the RAVDESS, SUBSECO, EMOVO, and EMODB datasets, respectively.

查看原文本刊更多论文

基于IGRFXG集成特征选择方法的多语言语音情感识别

在人机交互领域，通过语音信号识别情感是非常必要的。从语音信号中选择重要特征对于提高机器学习分类器的准确性至关重要。包括不必要的特征会降低模型的准确性并增加系统的复杂性。在这项研究中，我们提出了一种新的方法来解决语音情感识别中的特征选择挑战。我们的方法采用IGRFXG集成特征选择方法，IGRFXG的名称来源于它结合的三种基本特征选择技术：信息增益（IG）、随机森林（RF）和XGBoost (XG)，该方法分两个阶段运行。在第一阶段，为了减少不必要特征的存在，我们提出了一种异构集成特征选择技术，该技术集成了三种不同的特征选择方法：信息增益、随机森林和XG Boost。在这个阶段，我们根据它们的重要性评分对特征进行优先排序。在第二阶段，我们将各种特征选择器选择的顶部特征相交并生成特征核。这个特征内核随后被传递给机器学习分类器。使用六种不同的机器学习分类器在四个可公开访问的数据集（即RAVDESS， SUBESCO， EMOVO和EMODB）上对所提出的方法进行了评估。机器学习模型的性能是根据准确性、召回率、精度和F1分数等指标来评估的。性能评估表明，RAVDESS、SUBSECO、EMOVO和EMODB数据集的最大准确率分别为79.28%、92.80%、95.00%和90.96%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Acoustics 物理-声学

CiteScore

7.40

自引率

11.80%

发文量

618

审稿时长

7.5 months

期刊介绍： Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.