{"title":"Speech emotion classification using semi-supervised LSTM","authors":"Nattipon Itponjaroen, Kumpee Apsornpasakorn, Eakarat Pimthai, Khwanchai Kaewkaisorn, Shularp Panitchart, Thitirat Siriborvornratanakul","doi":"10.1007/s43674-023-00059-x","DOIUrl":null,"url":null,"abstract":"<div><p>Speech mood analysis is a challenging task with unclear optimal feature selection. The nature of the dataset, whether it is from an infant or adult, is crucial to consider. In this study, the characteristics of speech were investigated using Mel-frequency cepstral coefficients (MFCC) to analyze audio files. The CREMA-D dataset, which includes six different mood states (normal, angry, happy, sad, scared, and irritated), was employed to identify mood states from speech files. A mood classification system was proposed that integrates Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) models to increase the number of labeled data in small datasets and improve classification accuracy.</p><p>A semi-supervised model was proposed in this study to improve the accuracy of speech mood classification systems. The approach was tested on a classification model that used SVM and LSTM, and it was found that the semi-supervised model outperforms both SVM and LSTM models, achieving a validation accuracy of 89.72%. This result surpasses the accuracy achieved by SVM and LSTM models alone. Moreover, the semi-supervised method was observed to accelerate the training process of the model. These outcomes illustrate the efficacy of the proposed model and its potential to enhance speech mood analysis techniques.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"3 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in computational intelligence","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43674-023-00059-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Speech mood analysis is a challenging task with unclear optimal feature selection. The nature of the dataset, whether it is from an infant or adult, is crucial to consider. In this study, the characteristics of speech were investigated using Mel-frequency cepstral coefficients (MFCC) to analyze audio files. The CREMA-D dataset, which includes six different mood states (normal, angry, happy, sad, scared, and irritated), was employed to identify mood states from speech files. A mood classification system was proposed that integrates Support Vector Machines (SVM) and Long Short-Term Memory (LSTM) models to increase the number of labeled data in small datasets and improve classification accuracy.
A semi-supervised model was proposed in this study to improve the accuracy of speech mood classification systems. The approach was tested on a classification model that used SVM and LSTM, and it was found that the semi-supervised model outperforms both SVM and LSTM models, achieving a validation accuracy of 89.72%. This result surpasses the accuracy achieved by SVM and LSTM models alone. Moreover, the semi-supervised method was observed to accelerate the training process of the model. These outcomes illustrate the efficacy of the proposed model and its potential to enhance speech mood analysis techniques.