{"title":"Multilingual speech emotion recognition using IGRFXG – Ensemble feature selection approach","authors":"Astha Tripathi, Poonam Rani","doi":"10.1016/j.apacoust.2025.110905","DOIUrl":null,"url":null,"abstract":"<div><div>In the field of Human–Computer Interaction, it is essential to recognize emotions through speech signals. Selecting the important features from speech signals is crucial to increase the accuracy of machine learning classifiers. Including unnecessary features diminishes model accuracy and increase system complexity. In this study, we propose a novel approach to tackle the feature selection challenge in speech emotion recognition. Our method employs a IGRFXG – ensemble feature selection approach, the name IGRFXG is derived from the three base feature selection techniques it combines: Information gain (IG), Random Forest (RF), and XGBoost (XG), which operates in two stages. In the first stage, to reduce the presence of unnecessary features, we propose an heterogeneous ensemble feature selection technique that integrates three distinct feature selection methods: Information gain, Random Forest and XG Boost. During this stage, we prioritize the features based on their importance score. In the second stage, we intersect the top features selected by various feature selectors and generate a feature kernel. This feature kernel is subsequently passed to machine learning classifiers. The proposed approach is evaluated on four publicly accessible datasets, namely RAVDESS, SUBESCO, EMOVO, and EMODB using six different machine learning classifiers. The performance of the ML models is assessed based on metrics such as accuracy, recall, precision, and F1 score. The performance assessments indicate that the maximum accuracies achieved are 79.28 %, 92.80 %, 95.00 %, and 90.96 % for the RAVDESS, SUBSECO, EMOVO, and EMODB datasets, respectively.</div></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":"240 ","pages":"Article 110905"},"PeriodicalIF":3.4000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X25003779","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of Human–Computer Interaction, it is essential to recognize emotions through speech signals. Selecting the important features from speech signals is crucial to increase the accuracy of machine learning classifiers. Including unnecessary features diminishes model accuracy and increase system complexity. In this study, we propose a novel approach to tackle the feature selection challenge in speech emotion recognition. Our method employs a IGRFXG – ensemble feature selection approach, the name IGRFXG is derived from the three base feature selection techniques it combines: Information gain (IG), Random Forest (RF), and XGBoost (XG), which operates in two stages. In the first stage, to reduce the presence of unnecessary features, we propose an heterogeneous ensemble feature selection technique that integrates three distinct feature selection methods: Information gain, Random Forest and XG Boost. During this stage, we prioritize the features based on their importance score. In the second stage, we intersect the top features selected by various feature selectors and generate a feature kernel. This feature kernel is subsequently passed to machine learning classifiers. The proposed approach is evaluated on four publicly accessible datasets, namely RAVDESS, SUBESCO, EMOVO, and EMODB using six different machine learning classifiers. The performance of the ML models is assessed based on metrics such as accuracy, recall, precision, and F1 score. The performance assessments indicate that the maximum accuracies achieved are 79.28 %, 92.80 %, 95.00 %, and 90.96 % for the RAVDESS, SUBSECO, EMOVO, and EMODB datasets, respectively.
期刊介绍:
Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense.
Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems.
Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.