基于小波近似系数的语音情感识别滤波方法

IF 5.2 2区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Measurement Pub Date : 2025-06-13 DOI:10.1016/j.measurement.2025.118165

Ravi, Sachin Taran

{"title":"基于小波近似系数的语音情感识别滤波方法","authors":"Ravi, Sachin Taran","doi":"10.1016/j.measurement.2025.118165","DOIUrl":null,"url":null,"abstract":"<div><div>The understanding of emotions from human speech signals is one of the most attractive areas of research. However, it is also challenging due to the influence of language and accent on speech. This framework presents a solution for speech emotion recognition based on wavelet approximation method. In this work, the Wavelet Approximation Speech Emotion Recognition (WaSER) model is presemted. The ‘db4′ wavelet is explored to calculate the approximation coefficient. At the initial stage, silent segments of the raw speech signal are eliminated using wavelet approximation and energy thresholding, resulting in a filtered signal. For the wavelet selection and threshold value optimization, various experiments are performed. A set of prosodic and spectral features is then extracted from the filtered signal. The potential features are selected using the ReliefF algorithm, and these optimal features are utilized for further processing. An ablation experiment is also presented to demonstrate the potential of the proposed WaSER model. The WaSER model was tested on various language-based datasets, including RAVDESS, EMOVO, Emo-DB and ShEMO, attaining a highest accuracy of 94.6% on the ShEMO dataset. The ablation experiment shows that filtering the speech significantly enhanced accuracy across all datasets and maintaining a balance between accuracy and sensitivity. Also, the proposed model drastically reduces the response time compared to the existing state of art.</div></div>","PeriodicalId":18349,"journal":{"name":"Measurement","volume":"256 ","pages":"Article 118165"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A filtering approach for speech emotion recognition using wavelet approximation coefficient\",\"authors\":\"Ravi, Sachin Taran\",\"doi\":\"10.1016/j.measurement.2025.118165\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The understanding of emotions from human speech signals is one of the most attractive areas of research. However, it is also challenging due to the influence of language and accent on speech. This framework presents a solution for speech emotion recognition based on wavelet approximation method. In this work, the Wavelet Approximation Speech Emotion Recognition (WaSER) model is presemted. The ‘db4′ wavelet is explored to calculate the approximation coefficient. At the initial stage, silent segments of the raw speech signal are eliminated using wavelet approximation and energy thresholding, resulting in a filtered signal. For the wavelet selection and threshold value optimization, various experiments are performed. A set of prosodic and spectral features is then extracted from the filtered signal. The potential features are selected using the ReliefF algorithm, and these optimal features are utilized for further processing. An ablation experiment is also presented to demonstrate the potential of the proposed WaSER model. The WaSER model was tested on various language-based datasets, including RAVDESS, EMOVO, Emo-DB and ShEMO, attaining a highest accuracy of 94.6% on the ShEMO dataset. The ablation experiment shows that filtering the speech significantly enhanced accuracy across all datasets and maintaining a balance between accuracy and sensitivity. Also, the proposed model drastically reduces the response time compared to the existing state of art.</div></div>\",\"PeriodicalId\":18349,\"journal\":{\"name\":\"Measurement\",\"volume\":\"256 \",\"pages\":\"Article 118165\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0263224125015246\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0263224125015246","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

从人类语音信号中理解情感是最具吸引力的研究领域之一。然而，由于语言和口音对演讲的影响，这也很有挑战性。该框架提出了一种基于小波逼近方法的语音情感识别解决方案。本文提出了小波近似语音情感识别（WaSER）模型。探讨了“db4”小波来计算近似系数。在初始阶段，使用小波逼近和能量阈值去除原始语音信号的无声段，得到滤波后的信号。在小波选择和阈值优化方面，进行了各种实验。然后从滤波后的信号中提取一组韵律和频谱特征。使用ReliefF算法选择潜在特征，并利用这些最优特征进行进一步处理。烧蚀实验也证明了所提出的WaSER模型的潜力。WaSER模型在各种基于语言的数据集上进行了测试，包括RAVDESS， EMOVO， Emo-DB和ShEMO，在ShEMO数据集上达到了94.6%的最高准确率。消融实验表明，过滤语音显著提高了所有数据集的准确性，并保持了准确性和灵敏度之间的平衡。此外，与现有技术相比，所提出的模型大大缩短了响应时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A filtering approach for speech emotion recognition using wavelet approximation coefficient

The understanding of emotions from human speech signals is one of the most attractive areas of research. However, it is also challenging due to the influence of language and accent on speech. This framework presents a solution for speech emotion recognition based on wavelet approximation method. In this work, the Wavelet Approximation Speech Emotion Recognition (WaSER) model is presemted. The ‘db4′ wavelet is explored to calculate the approximation coefficient. At the initial stage, silent segments of the raw speech signal are eliminated using wavelet approximation and energy thresholding, resulting in a filtered signal. For the wavelet selection and threshold value optimization, various experiments are performed. A set of prosodic and spectral features is then extracted from the filtered signal. The potential features are selected using the ReliefF algorithm, and these optimal features are utilized for further processing. An ablation experiment is also presented to demonstrate the potential of the proposed WaSER model. The WaSER model was tested on various language-based datasets, including RAVDESS, EMOVO, Emo-DB and ShEMO, attaining a highest accuracy of 94.6% on the ShEMO dataset. The ablation experiment shows that filtering the speech significantly enhanced accuracy across all datasets and maintaining a balance between accuracy and sensitivity. Also, the proposed model drastically reduces the response time compared to the existing state of art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Measurement 工程技术-工程：综合

CiteScore

10.20

自引率

12.50%

发文量

1589

审稿时长

12.1 months

期刊介绍： Contributions are invited on novel achievements in all fields of measurement and instrumentation science and technology. Authors are encouraged to submit novel material, whose ultimate goal is an advancement in the state of the art of: measurement and metrology fundamentals, sensors, measurement instruments, measurement and estimation techniques, measurement data processing and fusion algorithms, evaluation procedures and methodologies for plants and industrial processes, performance analysis of systems, processes and algorithms, mathematical models for measurement-oriented purposes, distributed measurement systems in a connected world.