语音信号中的情感建模：用于情绪识别系统的离散小波变换和机器学习工具

IF 2.4 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Computational Intelligence and Soft Computing Pub Date : 2024-04-02 DOI:10.1155/2024/7184018

K. Daqrouq, A. Balamesh, O. Alrusaini, A. Alkhateeb, A. S. Balamash

{"title":"语音信号中的情感建模：用于情绪识别系统的离散小波变换和机器学习工具","authors":"K. Daqrouq, A. Balamesh, O. Alrusaini, A. Alkhateeb, A. S. Balamash","doi":"10.1155/2024/7184018","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition (SER) is a challenging task due to the complex and subtle nature of emotions. This study proposes a novel approach for emotion modeling using speech signals by combining discrete wavelet transform (DWT) with linear prediction coding (LPC). The performance of various classifiers, including support vector machine (SVM), K-Nearest Neighbors (KNN), Efficient Logistic Regression, Naive Bayes, Ensemble, and Neural Network, was evaluated for emotion classification using the EMO-DB dataset. Evaluation metrics such as area under the curve (AUC), average prediction accuracy, and cross-validation techniques were employed. The results indicate that KNN and SVM classifiers exhibited high accuracy in distinguishing sadness from other emotions. Ensemble methods and Neural Networks also demonstrated strong performance in sadness classification. While Efficient Logistic Regression and Naive Bayes classifiers showed competitive performance, they were slightly less accurate compared to other classifiers. Furthermore, the proposed feature extraction method yielded the highest average accuracy, and its combination with formants or wavelet entropy further improved classification accuracy. On the other hand, Efficient Logistic Regression exhibited the lowest accuracies among the classifiers. The uniqueness of this study was that it investigated a combined feature extraction method and integrated them to compare with various forms of combinations. However, the purposes of the investigation include improved performance of the classifiers, high effectiveness of the system, and the potential for emotion classification tasks. These findings can guide the selection of appropriate classifiers and feature extraction methods in future research and real-world applications. Further investigations can focus on refining classifiers and exploring additional feature extraction techniques to enhance emotion classification accuracy.","PeriodicalId":44894,"journal":{"name":"Applied Computational Intelligence and Soft Computing","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Emotion Modeling in Speech Signals: Discrete Wavelet Transform and Machine Learning Tools for Emotion Recognition System\",\"authors\":\"K. Daqrouq, A. Balamesh, O. Alrusaini, A. Alkhateeb, A. S. Balamash\",\"doi\":\"10.1155/2024/7184018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech emotion recognition (SER) is a challenging task due to the complex and subtle nature of emotions. This study proposes a novel approach for emotion modeling using speech signals by combining discrete wavelet transform (DWT) with linear prediction coding (LPC). The performance of various classifiers, including support vector machine (SVM), K-Nearest Neighbors (KNN), Efficient Logistic Regression, Naive Bayes, Ensemble, and Neural Network, was evaluated for emotion classification using the EMO-DB dataset. Evaluation metrics such as area under the curve (AUC), average prediction accuracy, and cross-validation techniques were employed. The results indicate that KNN and SVM classifiers exhibited high accuracy in distinguishing sadness from other emotions. Ensemble methods and Neural Networks also demonstrated strong performance in sadness classification. While Efficient Logistic Regression and Naive Bayes classifiers showed competitive performance, they were slightly less accurate compared to other classifiers. Furthermore, the proposed feature extraction method yielded the highest average accuracy, and its combination with formants or wavelet entropy further improved classification accuracy. On the other hand, Efficient Logistic Regression exhibited the lowest accuracies among the classifiers. The uniqueness of this study was that it investigated a combined feature extraction method and integrated them to compare with various forms of combinations. However, the purposes of the investigation include improved performance of the classifiers, high effectiveness of the system, and the potential for emotion classification tasks. These findings can guide the selection of appropriate classifiers and feature extraction methods in future research and real-world applications. Further investigations can focus on refining classifiers and exploring additional feature extraction techniques to enhance emotion classification accuracy.\",\"PeriodicalId\":44894,\"journal\":{\"name\":\"Applied Computational Intelligence and Soft Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computational Intelligence and Soft Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1155/2024/7184018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computational Intelligence and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2024/7184018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于情绪的复杂性和微妙性，语音情绪识别（SER）是一项具有挑战性的任务。本研究结合离散小波变换（DWT）和线性预测编码（LPC），提出了一种利用语音信号进行情感建模的新方法。使用 EMO-DB 数据集评估了支持向量机 (SVM)、K-近邻 (KNN)、高效逻辑回归、Naive Bayes、Ensemble 和神经网络等各种分类器在情绪分类中的性能。评估指标包括曲线下面积（AUC）、平均预测准确率和交叉验证技术。结果表明，KNN 和 SVM 分类器在区分悲伤与其他情绪方面表现出较高的准确性。集合方法和神经网络在悲伤情绪分类方面也表现出色。虽然高效逻辑回归和 Naive Bayes 分类器表现出了很强的竞争力，但与其他分类器相比，它们的准确率略低。此外，所提出的特征提取方法获得了最高的平均准确率，其与形声或小波熵的结合进一步提高了分类准确率。另一方面，在所有分类器中，高效逻辑回归的准确率最低。这项研究的独特之处在于，它研究了一种组合特征提取方法，并将其与各种形式的组合进行了整合比较。不过，调查的目的包括提高分类器的性能、系统的高效性以及情感分类任务的潜力。这些发现可以指导今后的研究和实际应用中选择合适的分类器和特征提取方法。进一步的研究可以侧重于改进分类器和探索其他特征提取技术，以提高情感分类的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Emotion Modeling in Speech Signals: Discrete Wavelet Transform and Machine Learning Tools for Emotion Recognition System

Speech emotion recognition (SER) is a challenging task due to the complex and subtle nature of emotions. This study proposes a novel approach for emotion modeling using speech signals by combining discrete wavelet transform (DWT) with linear prediction coding (LPC). The performance of various classifiers, including support vector machine (SVM), K-Nearest Neighbors (KNN), Efficient Logistic Regression, Naive Bayes, Ensemble, and Neural Network, was evaluated for emotion classification using the EMO-DB dataset. Evaluation metrics such as area under the curve (AUC), average prediction accuracy, and cross-validation techniques were employed. The results indicate that KNN and SVM classifiers exhibited high accuracy in distinguishing sadness from other emotions. Ensemble methods and Neural Networks also demonstrated strong performance in sadness classification. While Efficient Logistic Regression and Naive Bayes classifiers showed competitive performance, they were slightly less accurate compared to other classifiers. Furthermore, the proposed feature extraction method yielded the highest average accuracy, and its combination with formants or wavelet entropy further improved classification accuracy. On the other hand, Efficient Logistic Regression exhibited the lowest accuracies among the classifiers. The uniqueness of this study was that it investigated a combined feature extraction method and integrated them to compare with various forms of combinations. However, the purposes of the investigation include improved performance of the classifiers, high effectiveness of the system, and the potential for emotion classification tasks. These findings can guide the selection of appropriate classifiers and feature extraction methods in future research and real-world applications. Further investigations can focus on refining classifiers and exploring additional feature extraction techniques to enhance emotion classification accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Computational Intelligence and Soft Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

6.10

自引率

3.40%

发文量

审稿时长

21 weeks

期刊介绍： Applied Computational Intelligence and Soft Computing will focus on the disciplines of computer science, engineering, and mathematics. The scope of the journal includes developing applications related to all aspects of natural and social sciences by employing the technologies of computational intelligence and soft computing. The new applications of using computational intelligence and soft computing are still in development. Although computational intelligence and soft computing are established fields, the new applications of using computational intelligence and soft computing can be regarded as an emerging field, which is the focus of this journal.